What an LLM actually is — a supercharged autocomplete

No math, no jargon. If you’ve used ChatGPT or Claude and wondered what’s really happening under the hood, this is the ground floor. Once it clicks, every other explainer on this site makes more sense — and you’ll use these tools far better. We’ll link the deep dives as we go.

1 · What the three letters mean

LLM = Large Language Model. Large — it has hundreds of billions of internal settings, trained on a huge slice of the internet. Language — its entire world is text: words, code, symbols. Model — a mathematical pattern-finder, a very powerful statistical guessing machine.

Mental model

Think of an LLM as the autocomplete on your phone — but one that swallowed most of the internet, books, code, and Wikipedia, then practiced predicting the next word a trillion times. At its core it does not “think” or “know” the way you do. It has learned, with eerie precision, what tends to come next.

2 · The one trick: predict the next word

Everything an LLM does grows from one humble skill: guessing what comes next. Give your brain “The sky is ___” and it instantly offers blue, clear, cloudy — and ranks them (blue beats spaghetti). An LLM does exactly that, but for every word in its vocabulary at once, assigning each a probability:

After "The sky is…"  (illustrative)
blue       ████████████████████████████  62%
clear      ████████                       18%
cloudy     █████                          12%
falling    ██                              5%
spaghetti  ▌                               1%

It picks a word — usually a likely one, with a dash of randomness for variety — adds it, and repeats: next word, next word, next word. String thousands together and you get essays, code, and emails. That “dash of randomness” is a real dial you can control; the temperature & sampling explainer lets you drag it and watch the distribution change.

Analogy

It’s like knitting a scarf one stitch at a time. Each stitch depends on the ones before it; the model never sees the whole scarf in advance — it just keeps adding the most sensible next stitch until you tell it to stop.

3 · Tokens: how AI actually reads

Before predicting anything, the model chops text into tokens — chunks that might be a whole word, part of a word, or a single character. It thinks entirely in tokens and the numbers attached to them. cat is one token; unbelievable might be three (un·believ·able); a page of text (~500 words) is roughly 650 tokens (rule of thumb: 1 word ≈ 1.3 tokens).

Why this matters to you

“Context windows” and pricing are measured in tokens, not words. “128K tokens” is roughly a 300-page book the model can hold in mind at once. It’s also why LLMs miscount letters in a word — they see tokens, not letters. The tokenizers explainer shows exactly how text gets split.

4 · How it’s trained, in three stages

A raw model off the assembly line is useless — a brain with no memories. Three stages turn it into an assistant:

Pre-training — “read the internet.”Fed an enormous amount of text, it predicts the next token billions of times, absorbing grammar, facts, and reasoning patterns. Slow and expensive — this is the “large.” Deep dive →
Supervised fine-tuning — “learn to be an assistant.” Humans write example conversations; the model learns the format of being helpful. Deep dive →
RLHF — “learn what people prefer.” Humans rate answers; the model is nudged toward what people like and away from harm. The polish that makes it feel friendly and safe. Deep dive →

Key insight

The model’s knowledge is frozen at the moment training ended — that’s its “cutoff.” It won’t know about later events unless it’s connected to live search or tools (which is what tool use and RAG add).

5 · What’s actually inside

There’s no database of facts, no folder of answers. Everything the model learned is compressed into parameters— billions of numerical “dials” (weights), nudged ever so slightly during training until the model gets good at prediction. A trained LLM is a lossy, compressed summary of everything it read — like a blurry JPEG of the internet.

This is why LLMs are called black boxes: even their builders can’t point to one dial and say “that’s where it stores the capital of France.” The knowledge is smeared across billions of numbers working together. (Training is also where you can change those dials for your own task — see how to train an open model.)

Analogy

Imagine a giant mixing board with billions of knobs. Training nudges every knob a hair at a time until the music — the predictions — sounds right. No one can tell you what any single knob does, but together they make the music.

6 · Why it confidently makes things up

Because an LLM’s only true skill is generating plausible-sounding text, it will sometimes produce something perfectly confident and completely wrong. This is a hallucination— and it’s a feature of how prediction works, not a random bug. The model has no concept of truth; it predicts words that fit, and a wrong fact can fit just as smoothly as a right one. A fake book title is statistically shaped like a real one.

Important

Never trust an LLM blindly for facts, figures, citations, or legal/medical/financial information. Treat it as a brilliant, fast, occasionally-overconfident intern — verify anything that matters.

✅ Plays to its strengths

Drafting, rewriting, summarizing
Brainstorming & outlining
Explaining concepts simply
Translating & changing tone
Writing & debugging code

⚠️ Verify carefully

Specific facts, dates, statistics
Citations & quotes (often invented)
Math & precise calculations
Recent news after its cutoff
Legal, medical, financial advice

7 · How to actually use them well

Understanding the machinery makes you dramatically better at using it. Four principles that follow directly from how LLMs work:

Give rich context. The model only “knows” what’s in the conversation plus its training — more relevant detail, better predictions.
Be specific about the output. Ask for the format, tone, length, and audience. “Explain like I’m 12, in 3 bullets” beats “explain this.”
Iterate. Treat it as a back-and-forth; refine the answer the way you’d coach a junior teammate.
Verify the important stuff. Draft and think with it — fact-check anything with real-world consequences.

Pro tip — prompting is a skill

The quality of your output is shaped massively by the quality of your input. Learning to prompt well is the single highest-leverage skill for getting value from AI — and it needs zero coding. The same idea, taken to the model-selection layer, is routing: sending each request to the model that’s actually best for it.

8 · The six things to remember

An LLM is a giant next-word predictor — a supercharged autocomplete.
It reads in tokens, not words or letters.
It’s trained in three stages: pre-training → fine-tuning → human feedback.
Its “knowledge” lives in billions of numerical dials, not a database.
It can hallucinate — confidently wrong — so always verify facts.
Better context + clearer prompts = dramatically better results.

Where to go next

Now that the foundation is in place, the deeper explainers will make sense:

Tokenizers · Temperature & sampling · Attention — how the prediction actually happens.
Pretraining · RLHF · Train your own — how models are made and shaped.
RAG · Tool use — how models reach past their cutoff.
Leaderboard — which model is actually best (and best-value) for a given job.

EyesInAI·Loading explainers…

Explainers

Start here · the plain-English primer

What an LLM actually is — a supercharged autocomplete

1 · What the three letters mean

Mental model

2 · The one trick: predict the next word

After "The sky is…"  (illustrative)
blue       ████████████████████████████  62%
clear      ████████                       18%
cloudy     █████                          12%
falling    ██                              5%
spaghetti  ▌                               1%

Analogy

3 · Tokens: how AI actually reads

Why this matters to you

4 · How it’s trained, in three stages

A raw model off the assembly line is useless — a brain with no memories. Three stages turn it into an assistant:

Pre-training — “read the internet.”Fed an enormous amount of text, it predicts the next token billions of times, absorbing grammar, facts, and reasoning patterns. Slow and expensive — this is the “large.” Deep dive →
Supervised fine-tuning — “learn to be an assistant.” Humans write example conversations; the model learns the format of being helpful. Deep dive →
RLHF — “learn what people prefer.” Humans rate answers; the model is nudged toward what people like and away from harm. The polish that makes it feel friendly and safe. Deep dive →

Key insight

5 · What’s actually inside

Analogy

6 · Why it confidently makes things up

Important

Never trust an LLM blindly for facts, figures, citations, or legal/medical/financial information. Treat it as a brilliant, fast, occasionally-overconfident intern — verify anything that matters.

✅ Plays to its strengths

Drafting, rewriting, summarizing
Brainstorming & outlining
Explaining concepts simply
Translating & changing tone
Writing & debugging code

⚠️ Verify carefully

Specific facts, dates, statistics
Citations & quotes (often invented)
Math & precise calculations
Recent news after its cutoff
Legal, medical, financial advice

7 · How to actually use them well

Understanding the machinery makes you dramatically better at using it. Four principles that follow directly from how LLMs work:

Give rich context. The model only “knows” what’s in the conversation plus its training — more relevant detail, better predictions.
Be specific about the output. Ask for the format, tone, length, and audience. “Explain like I’m 12, in 3 bullets” beats “explain this.”
Iterate. Treat it as a back-and-forth; refine the answer the way you’d coach a junior teammate.
Verify the important stuff. Draft and think with it — fact-check anything with real-world consequences.

Pro tip — prompting is a skill

8 · The six things to remember

An LLM is a giant next-word predictor — a supercharged autocomplete.
It reads in tokens, not words or letters.
It’s trained in three stages: pre-training → fine-tuning → human feedback.
Its “knowledge” lives in billions of numerical dials, not a database.
It can hallucinate — confidently wrong — so always verify facts.
Better context + clearer prompts = dramatically better results.

Where to go next

Now that the foundation is in place, the deeper explainers will make sense:

Tokenizers · Temperature & sampling · Attention — how the prediction actually happens.
Pretraining · RLHF · Train your own — how models are made and shaped.
RAG · Tool use — how models reach past their cutoff.
Leaderboard — which model is actually best (and best-value) for a given job.