What fine-tuning is
Fine-tuning is just more training — but on a small, curated dataset chosen to install a specific behavior, started from an already-pretrained model rather than from scratch. The model keeps everything it learned during pretraining and adjusts its weights to lean toward the new examples. Because the base model already understands language and the world, fine-tuning needs orders of magnitude less data and compute than pretraining — thousands of examples, not trillions of tokens.
Supervised fine-tuning and the dataset format
The most common form is supervised fine-tuning (SFT): train on explicit input → output pairs and have the model learn to produce the output given the input. For a chat assistant the dataset is a set of conversations — a prompt and the ideal response — and the model is trained to predict the response's tokens. Mechanically it is the same next-token objective as pretraining, but the loss is usually applied only to the response tokens, so the model learns to answer prompts rather than continue them.
A typical example looks like a structured record: a system instruction, a user message, and the target assistant reply. Hundreds to tens of thousands of these, written or curated by humans, are enough to convert a base model into something that reliably behaves like an assistant — which is exactly Stage 1 of the RLHF pipeline.
Instruction tuning vs task tuning
Two flavors are worth distinguishing. Instruction tuning uses a broad mix of many task types phrased as instructions, teaching the model the general skill of following instructions — this is the FLAN insight that turned zero-shot base models into helpful generalists. Task tuning narrows the dataset to one job — classify support tickets, extract fields from invoices, write in a house style — to squeeze maximum reliability on that one thing.
When to fine-tune — and when not to
Fine-tuning earns its keep when you need a consistent behavior or formatthat prompting can't reliably enforce, when you want to bake in a tone or domain so you can use a smaller, cheaper model, or when your prompt has grown into a wall of examples you'd rather move into the weights. It is the wrong tool for knowledge that changes — for that, retrieval ( RAG) is better, because you can update a document store instantly without retraining.
A useful ladder: try prompting first, then few-shot examples, then retrieval, and reach for fine-tuning when those plateau. Fine-tuning changes how the model behaves; retrieval changes what it knows at answer time. Confusing the two is the most common mistake.
The failure modes
Fine-tuning is sharp and can cut you. Catastrophic forgetting: train too hard on a narrow dataset and the model degrades at everything else. Overfitting: with too few or too repetitive examples, the model memorizes surface patterns rather than the intended behavior and generalizes poorly. And a subtle one — fine-tuning on model-generated data can amplify the base model's quirks rather than correct them. Small, clean, diverse datasets and conservative training beat large messy ones.
Full fine-tuning also produces a whole new copy of the model to store and serve, which is expensive at scale. That cost is exactly what parameter-efficient methods solve — see the LoRA / PEFT explainer, which trains a tiny set of adapter weights instead of all of them and is how most fine-tuning is done in practice today.