Ollama

Ollama is a tool that can run LLMs locally.

What Is Ollama?

Ollama is a tool widely used recently among AI and LLM developers.

Ollama is an open source platform that makes it easy to run and manage large language models (LLMs) in a local environment.
In other words, without using a cloud model such as the OpenAI API, it lets you load and use models such as Llama, Mistral, Gemma, and CodeLlama on your own PC (Mac/Linux/Windows).

Main Features of Ollama

  1. Local execution support
    • Models can run even without an internet connection
    • Useful for corporate security and personal privacy
  2. Simple model deployment
    • Run a model with a single command such as ollama run llama3
    • Supports model package management like Docker, managed with a configuration file called Modelfile
  3. Support for multiple models
    • Can download and run many models such as Meta LLaMA, Mistral, Gemma, Code Llama, and Phi
  4. API support
    • Opens a local server in REST API format (http://localhost:11434/api/generate) so other apps can call it
    • Can be used like a local OpenAI API server
  5. GPU optimization
    • Supports MPS (Mac) and CUDA (NVIDIA GPU), making it fast
    • Can also run on CPU, but more slowly

Using Ollama

Installation

After installation on macOS, running Ollama displays an Ollama icon in the menu bar.
Ollama

Running a Model

ollama run llama3

On first execution, the model is automatically downloaded and then run. The llama3 model is 4.7 GB.

Downloading Models

Search for models on the official site and install the model you want to use.

API Call (for example, curl)

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Explain quantum computing in simple terms"
}'

Model Management

  • ollama list -> Check installed models
  • ollama pull mistral -> Download a new model
  • ollama create mymodel -f Modelfile -> Create a custom model

Comparing Ollama with Other LLM Execution Frameworks

Tool Features
Ollama Simplest installation and execution, local API support, model package management
LM Studio GUI-based, intuitive model selection and execution
vLLM Optimized for high-performance server execution, mainly used for large-scale deployments
Text Generation WebUI Runs various models and provides a Web UI
OpenAI API Cloud-based and can use the latest models, but has cost and privacy issues

Use Cases

  • Building a local AI assistant in a development environment
  • Building an internal chatbot connected to secure company data
  • Building RAG systems by integrating with frameworks such as LangChain and LlamaIndex
  • Prototyping: Experimenting quickly without using OpenAI API costs

Summary

Ollama is a platform like “Docker for LLMs” that makes it easy to run LLMs locally. It can be used for many purposes, from personal research to enterprise chatbots.