How To Get Started with Ollama And Run Powerful AI Models Locally and Free

Advertisements

In the growing world of AI, the ability to run large language models (LLMs) locally offers privacy, flexibility, and cost savings. One of the best tools to do this is Ollama, a free and open-source solution that allows you to download and run models like LLaMA, Mistral, and others entirely on your machine.

Whether you’re a developer, AI enthusiast, or just want to experiment without using online services like ChatGPT, this guide walks you through setting up and using Ollama effectively.


💾 Installing Ollama

  1. Go to ollama.com and download the installer for your OS (Windows, macOS, Linux).
  2. Install it just like any regular app.
  3. On Windows, you can launch Ollama from the Start menu; on Mac or Linux, use terminal commands.

Once installed, run ollama in your terminal to verify it’s set up correctly.


🧑‍💻 Running Your First Model

To run a model:

ollama run llama2

Ollama will automatically download the model if it’s not already present and start an interactive session.

You can switch to other models like Mistral simply by:

ollama run mistral

List your installed models:

ollama list

Remove models:

Advertisements
ollama rm <model-name>

🚀 Using Ollama’s HTTP API

Ollama provides a built-in HTTP server so you can interact with models through code.

Start the server:

ollama serve

This runs the API locally (usually on port 11434). You can then use tools like Python to interact:

import requests

url = "http://localhost:11434/api/chat"
payload = {
    "model": "mistral",
    "messages": [
        {"role": "user", "content": "What is the capital of France?"}
    ]
}
response = requests.post(url, json=payload, stream=True)
for line in response.iter_lines():
    print(line.decode('utf-8'))

🔧 Customizing Models

You can create custom personalities or configurations by creating a model file:

FROM llama2
PARAMETER temperature=0.8
SYSTEM "You are a helpful assistant."

Then create the model:

ollama create mario -f ./model-file

Run it:

ollama run mario

📊 Hardware Considerations

Keep in mind:

  • LLaMA 2 7B: ~8GB RAM minimum
  • Mistral 7B: ~12-16GB RAM
  • LLaMA 65B or 70B: 48GB+ RAM or use quantized versions

You can filter lighter models or use 4-bit quantization for better performance on low-end machines.


🌟 Why Use Ollama?

  • Privacy: Your data stays on your device
  • Cost: Completely free
  • Flexibility: Use any open-source model you want
  • Offline Capability: No need for internet access

Advertisements

Leave a Reply

x
Advertisements