The Concept
Embeddings are fixed-length numeric vectors that represent the meaning of text. Every ML model that works with language uses them. When someone says "vector search" or "semantic similarity," they're talking about comparing embeddings. And if you've ever used a hash function, you already understand 80% of how they work.
Most explanations of embeddings assume you're starting from zero. If you've built systems for a living, you aren't. You already have the mental model — it just hasn't been pointed at this primitive yet.
If You Already Know Hash Functions, You Already Know Most of This
Think about what a hash function actually does. It takes an arbitrary input — a string, a file, a stream of bytes — and produces a fixed-length output. MD5 gives you 128 bits. SHA-256 gives you 256 bits. The input can be anything. The output is always the same size. This is so fundamental to how you think about systems that you probably don't even notice you're relying on it.
An embedding function does the exact same thing. It takes text (or an image, or audio) and produces a fixed-length vector. OpenAI's text-embedding-3-small gives you 1536 floats. Cohere's embed-v3 gives you 1024. The input can be a word, a sentence, or a full document. The output is always the same dimensionality.
The parallels are striking:
| Hash function | Embedding function |
|---|---|
| Arbitrary input → fixed output | Arbitrary input → fixed output |
| Deterministic | Deterministic |
| Output is compact | Output is compact |
| You compare outputs to check for equality | You compare outputs to check for similarity |
Read that last row again. That's where the entire value of embeddings lives.
What's Actually New
Here's where the analogy breaks, and it breaks in the most important way.
Hash functions are designed so that similar inputs produce completely different outputs. Change one bit and the hash is unrecognizable. That's the point. It's a security property. You rely on it every time you store a password or verify a download.
Embedding functions are designed for the exact opposite. Similar inputs produce similar outputs. "The server crashed at 3am" and "Our backend went down overnight" should produce vectors that are close together in high-dimensional space. The entire value proposition is that meaning is preserved in the geometry.
This is genuinely new. Not "new" in the way most AI things are new — which is to say, a rebranding of something that already existed. This is a fundamentally different contract between input and output. And it enables things that are impossible with the tools you currently use:
- Finding documents that are about the same thing, even when they use different words
- Ranking results by relevance instead of binary match/no-match
- Clustering data by meaning without defining the categories upfront
The distance between two vectors tells you how semantically similar the inputs are. Cosine similarity is the standard measure, and it's exactly as simple as it sounds:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
Two similar sentences score close to 1.0. Two unrelated sentences score close to 0. That's the whole interface.
Under the Hood
The practical reality of using embeddings is anticlimactic. That's a feature.
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="Kubernetes pod scheduling"
)
vector = response.data[0].embedding # List of 1536 floats
That's it. You now have a point in 1536-dimensional space. Store it somewhere. When a query comes in, embed the query the same way and find the nearest stored vectors. This is the entire pattern behind every "AI-powered search" feature shipped in the last two years.
The thing that matters for systems engineers — the thing no tutorial bothers to tell you — is that embedding generation is a pure function with no side effects. Same model plus same input equals same vector, every time. Which means you can cache aggressively. You can batch. You can run it offline during a nightly job and store the results. You can put it behind the same caching layer you'd put any other deterministic computation behind.
The cost profile makes this practical. Embedding is roughly 100x cheaper than an LLM completion call and runs in single-digit milliseconds for short text. The expensive part is storing and searching millions of vectors efficiently — which is where vector databases come in. That's next week.
Decision Framework
Use embeddings when you need semantic search — when the user types "deployment issues" and should find a doc titled "CI/CD pipeline failures." Use them when you're building RAG and need to find relevant context for an LLM. Use them when you want to cluster support tickets by topic without writing a hundred regex rules.
Don't use them when exact match works. If your users search by order ID, you don't need embeddings. You need a database index. Don't use them when your data is structured and queryable — SQL didn't stop being good just because transformers exist. And don't use them when you need to explain why a result matched. Embeddings are opaque. The vector doesn't tell you which words mattered.
Concrete recommendations:
- Budget-sensitive, under 1M docs:
text-embedding-3-small(1536 dims). Cheap and good enough for most internal tools. - Production semantic search:
text-embedding-3-large(3072 dims). Better quality, still fast. - On-premise or privacy-constrained: sentence-transformers
all-MiniLM-L6-v2. Runs locally, 384 dims, no data leaves your network.
What Your Manager Thinks It Does vs. What It Actually Does
Your manager thinks embeddings are "AI that understands our data."
What embeddings actually do is convert text to numbers in a way that preserves semantic relationships. They don't understand anything. They map language to geometry. Similar meaning equals nearby points. That's the entirety of it.
The useful reframe for your next meeting: "Embeddings let us do fuzzy matching on meaning instead of exact matching on keywords. Same concept as a search index, but it catches paraphrases and related concepts that keyword search misses."
That sentence will save you twenty minutes of whiteboarding.
Ship This Weekend
Build a semantic search engine over your team's documentation. Four hours, under ten cents in API costs, zero infrastructure.
- Export your docs to plain text — Confluence API, Notion export, or just copy the top 50 pages
- Chunk each doc into ~500-token paragraphs
- Embed each chunk with
text-embedding-3-small - Store vectors in a local ChromaDB instance
import chromadb
client = chromadb.Client()
collection = client.create_collection("team-docs")
collection.add(
documents=["chunk text here", "another chunk"],
ids=["doc1-chunk1", "doc1-chunk2"]
)
results = collection.query(
query_texts=["How do we deploy to staging?"],
n_results=5
)
pip install chromadb. Zero config. No Docker. No API keys for the vector store.
When you run this against your own docs, the behavior is immediate: a query in plain English returns the exact paragraph from a runbook that answers it, even when query and answer share zero keywords. That's where embeddings click — not from reading about them, but from seeing your own data retrieved by meaning instead of by token overlap.
Further Reading
- What Are Embeddings? (Vicki Boykis) — The deepest technical explanation available. If you want the math, start here. Boykis writes like an engineer, not a marketer.
- OpenAI Embeddings Guide — The canonical API reference. Read the "Use cases" section and skip the rest until you need it.
- Sentence Transformers Documentation — Best open-source embedding models. Essential reading if you can't send data to an external API.
- ChromaDB Getting Started — Zero-config vector store for prototyping. You'll outgrow it. That's fine. The point is to start.