How to Fine-Tune Your Own LLM Locally with Ollama and MLX (Apple)

Advertisements

AI models are incredible. Their ability to respond to just about any question with relevant and human-like answers feels like something out of science fiction. But as good as they are, we’re still on a quest to make them better—more personalized, more useful, and more aligned with the way we work and communicate.

The models you download from Hugging Face or Ollama are designed to be general-purpose—good for everyone. But what if you want a model that’s specifically good for you? That’s where fine-tuning comes in.

What Is Fine-Tuning?

Fine-tuning isn’t necessarily about teaching a model brand new information (though that’s possible). It’s about adjusting how the model responds: the tone it uses, the structure of its output, the way it answers certain types of questions.

For example, maybe you want a model that writes emails in your tone, generates SQL queries based on your database schema, or summarizes documents in a way that fits your workflow.

There are two primary ways to “customize” a model:

  1. Prompt engineering — Give the model context at inference time.
  2. Fine-tuning — Actually modify the model’s weights to permanently change how it responds.

In this post, we’ll walk through a surprisingly simple workflow for fine-tuning an open-source LLM locally using Ollama and MLX.


Step 1: Create Your Dataset

This is the hardest part. You’ll need a collection of input-output pairs: prompts and the ideal responses. For example, if you’re training a model to respond to emails or code questions, your dataset should reflect those use cases.

For fine-tuning with Mistral in Ollama, your data needs to be formatted like this (each example in its own line of a .jsonl file):

{"text": "[INST] Write a short email to a client explaining a delayed delivery. [/INST] Hey there, just a heads-up..."}

💡 Tip: You need at least 50–100 examples, but more is better. Split the data into train.jsonl, valid.jsonl, and test.jsonl for best results.


Step 2: Fine-Tune with MLX

MLX is Apple’s framework for ML on Apple Silicon. It’s fast, lightweight, and integrates nicely with Ollama.

Advertisements

Install MLX-LM:

pip install mlx-lm

Login to Hugging Face to access the Mistral model:

huggingface-cli login

Then run the fine-tuning command:

mlx_lm.lora --train \
  --model mistralai/Mistral-7B-v0.1 \
  --data /path/to/dataset \
  --batch-size 4

This will generate an adapters/ directory containing the fine-tuned weights.


Step 3: Load the Adapter into Ollama

Once training is complete, create a Modelfile that looks like this:

FROM mistral
ADAPTER ./adapters

Now, register the model with Ollama:

ollama create my-finetuned-model -f Modelfile
ollama run my-finetuned-model

🎉 That’s it—you’re now running a locally fine-tuned version of Mistral, optimized for your tasks and tone.


Why This Is a Game-Changer

Fine-tuning used to be something reserved for ML researchers with cloud budgets. Now, with tools like MLX, Ollama, and Unsloth, anyone with a decent machine (or a Colab account) can do it.

It’s fast. It’s local. It’s yours.


What’s Next?

  • Try Unsloth for faster training on Nvidia GPUs.
  • Experiment with different prompt styles for better results.
  • Fine-tune other models like LLaMA 3 or Mixtral.

Advertisements

Leave a Reply

x
Advertisements