Turning meaning into geometry

An embedding maps a word, sentence, image, or document to a point in a high-dimensional space — engineered so that things with similar meaning land close together. It is the quiet substrate under search, RAG, clustering, recommendation, and deduplication. Once you see meaning as direction in space, a lot of modern AI stops being mysterious.

A vector that means something

An embedding is just a list of numbers — a vector — say 768 or 1,536 of them. What makes it special is how those numbers are produced: a model is trained so that the positionof the vector encodes meaning. Words, sentences, or documents that are semantically similar get vectors that point in similar directions; unrelated ones point elsewhere. The individual numbers aren't human-interpretable, but their geometry is.

This is distinct from the RAG explainer, which uses embeddings as one step in a retrieval pipeline. Here we look at the embeddings themselves: what they are, how similarity is measured, and why they work.

Cosine similarity: measure the angle, not the distance

Given two embedding vectors, the standard way to score how related they are is cosine similarity — the cosine of the angle between them. It ranges from +1 (pointing the same direction, nearly identical meaning) through 0 (perpendicular, unrelated) to −1 (opposite). Crucially it ignores magnitude and looks only at direction, which is why a short query and a long document can be compared fairly even though their raw vector lengths differ.

Mechanically: take the dot product of the two vectors and divide by the product of their lengths. That normalization is the whole trick — it strips out “how big” and keeps “which way.” Try it below.

Try it: compare any two phrases

Type two phrases of your own and a real embedding model turns each into a 768-dimensional vector; we then compute their cosine similarity live. Try a sentence and its paraphrase with no shared words — they still score high, because the model compares meaning, not surface text. Switch to toy vectors to see the bare 4-dimensional mechanic offline.

Compare two phrases by meaning

Phrase APhrase B

or try:

Real embeddings from BAAI/bge-base-en-v1.5 (768 dimensions), computed live; cosine is exact. Notice the model scores paraphraseshigh even with no shared words, and unrelated sentences near zero — it’s comparing meaning, not surface text.

How embeddings get built

Early word embeddings (Word2Vec, GloVe) learned a single fixed vector per word from co-occurrence statistics — the famous result that king − man + woman ≈ queen came from there. Modern text embeddings are contextual and produced by transformer encoders (see encoder vs decoder): they read a whole sentence and emit a vector that reflects meaning in context, so “bank” in a river sentence lands far from “bank” in a finance one.

They are trained with objectives that pull related pairs together and push unrelated pairs apart — contrastive learning. The same idea extends across modalities: CLIP trains image and text encoders into one sharedspace so a photo of a dog lands near the words “a dog,” which is the foundation of multimodal models (covered in the vision encoders explainer).

Why dimensionality matters

More dimensions give the space more room to separate fine distinctions, up to a point — there's a tradeoff against storage and search speed, since every stored item is a vector and every query compares against many of them. Production systems keep millions to billions of embeddings in a vector databasewith approximate-nearest-neighbor indexes so that “find the closest vectors to this query” runs in milliseconds rather than scanning everything.

That single operation — embed a query, find its nearest neighbors — powers semantic search, recommendation (“items near what you liked”), clustering (group nearby vectors), deduplication (near-identical vectors are duplicates), and the retrieval step of RAG. Different jobs, one geometric primitive.

The takeaway

Embeddings turn the fuzzy notion of “similar meaning” into the precise, cheap operation of “small angle between vectors.” Once meaning lives in a shared geometric space, comparison, search, and grouping become arithmetic — and that is why embeddings show up underneath so much of what models do, even when you never see them directly.

EyesInAI·Loading explainers…

Explainers