1 · What the three letters mean
LLM = Large Language Model. Large — it has hundreds of billions of internal settings, trained on a huge slice of the internet. Language — its entire world is text: words, code, symbols. Model — a mathematical pattern-finder, a very powerful statistical guessing machine.
2 · The one trick: predict the next word
Everything an LLM does grows from one humble skill: guessing what comes next. Give your brain “The sky is ___” and it instantly offers blue, clear, cloudy — and ranks them (blue beats spaghetti). An LLM does exactly that, but for every word in its vocabulary at once, assigning each a probability:
After "The sky is…" (illustrative) blue ████████████████████████████ 62% clear ████████ 18% cloudy █████ 12% falling ██ 5% spaghetti ▌ 1%
It picks a word — usually a likely one, with a dash of randomness for variety — adds it, and repeats: next word, next word, next word. String thousands together and you get essays, code, and emails. That “dash of randomness” is a real dial you can control; the temperature & sampling explainer lets you drag it and watch the distribution change.
3 · Tokens: how AI actually reads
Before predicting anything, the model chops text into tokens — chunks that might be a whole word, part of a word, or a single character. It thinks entirely in tokens and the numbers attached to them. cat is one token; unbelievable might be three (un·believ·able); a page of text (~500 words) is roughly 650 tokens (rule of thumb: 1 word ≈ 1.3 tokens).
4 · How it’s trained, in three stages
A raw model off the assembly line is useless — a brain with no memories. Three stages turn it into an assistant:
- Pre-training — “read the internet.”Fed an enormous amount of text, it predicts the next token billions of times, absorbing grammar, facts, and reasoning patterns. Slow and expensive — this is the “large.” Deep dive →
- Supervised fine-tuning — “learn to be an assistant.” Humans write example conversations; the model learns the format of being helpful. Deep dive →
- RLHF — “learn what people prefer.” Humans rate answers; the model is nudged toward what people like and away from harm. The polish that makes it feel friendly and safe. Deep dive →
5 · What’s actually inside
There’s no database of facts, no folder of answers. Everything the model learned is compressed into parameters— billions of numerical “dials” (weights), nudged ever so slightly during training until the model gets good at prediction. A trained LLM is a lossy, compressed summary of everything it read — like a blurry JPEG of the internet.
This is why LLMs are called black boxes: even their builders can’t point to one dial and say “that’s where it stores the capital of France.” The knowledge is smeared across billions of numbers working together. (Training is also where you can change those dials for your own task — see how to train an open model.)
6 · Why it confidently makes things up
Because an LLM’s only true skill is generating plausible-sounding text, it will sometimes produce something perfectly confident and completely wrong. This is a hallucination— and it’s a feature of how prediction works, not a random bug. The model has no concept of truth; it predicts words that fit, and a wrong fact can fit just as smoothly as a right one. A fake book title is statistically shaped like a real one.
- Drafting, rewriting, summarizing
- Brainstorming & outlining
- Explaining concepts simply
- Translating & changing tone
- Writing & debugging code
- Specific facts, dates, statistics
- Citations & quotes (often invented)
- Math & precise calculations
- Recent news after its cutoff
- Legal, medical, financial advice
7 · How to actually use them well
Understanding the machinery makes you dramatically better at using it. Four principles that follow directly from how LLMs work:
- Give rich context. The model only “knows” what’s in the conversation plus its training — more relevant detail, better predictions.
- Be specific about the output. Ask for the format, tone, length, and audience. “Explain like I’m 12, in 3 bullets” beats “explain this.”
- Iterate. Treat it as a back-and-forth; refine the answer the way you’d coach a junior teammate.
- Verify the important stuff. Draft and think with it — fact-check anything with real-world consequences.
8 · The six things to remember
- An LLM is a giant next-word predictor — a supercharged autocomplete.
- It reads in tokens, not words or letters.
- It’s trained in three stages: pre-training → fine-tuning → human feedback.
- Its “knowledge” lives in billions of numerical dials, not a database.
- It can hallucinate — confidently wrong — so always verify facts.
- Better context + clearer prompts = dramatically better results.
Now that the foundation is in place, the deeper explainers will make sense:
- Tokenizers · Temperature & sampling · Attention — how the prediction actually happens.
- Pretraining · RLHF · Train your own — how models are made and shaped.
- RAG · Tool use — how models reach past their cutoff.
- Leaderboard — which model is actually best (and cheapest) for a given job.