Embeddings for People Who Understand Hashing

The Concept

Embeddings are fixed-length numeric vectors that represent the meaning of text. Every ML model that works with language uses them. When someone says "vector search" or "semantic similarity," they're talking about comparing embeddings. And if you've ever used a hash function, you already understand 80% of how they work.

Most explanations of embeddings assume you're starting from zero. If you've built systems for a living, you aren't. You already have the mental model — it just hasn't been pointed at this primitive yet.

If You Already Know Hash Functions, You Already Know Most of This

Think about what a hash function actually does. It takes an arbitrary input — a string, a file, a stream of bytes — and produces a fixed-length output. MD5 gives you 128 bits. SHA-256 gives you 256 bits. The input can be anything. The output is always the same size. This is so fundamental to how you think about systems that you probably don't even notice you're relying on it.

An embedding function does the exact same thing. It takes text (or an image, or audio) and produces a fixed-length vector. OpenAI's text-embedding-3-small gives you 1536 floats. Cohere's embed-v3 gives you 1024. The input can be a word, a sentence, or a full document. The output is always the same dimensionality.

The parallels are striking:

Hash function	Embedding function
Arbitrary input → fixed output	Arbitrary input → fixed output
Deterministic	Deterministic
Output is compact	Output is compact
You compare outputs to check for equality	You compare outputs to check for similarity

Read that last row again. That's where the entire value of embeddings lives.

What's Actually New

Here's where the analogy breaks, and it breaks in the most important way.

Hash functions are designed so that similar inputs produce completely different outputs. Change one bit and the hash is unrecognizable. That's the point. It's a security property. You rely on it every time you store a password or verify a download.

Embedding functions are designed for the exact opposite. Similar inputs produce similar outputs. "The server crashed at 3am" and "Our backend went down overnight" should produce vectors that are close together in high-dimensional space. The entire value proposition is that meaning is preserved in the geometry.

This is genuinely new. Not "new" in the way most AI things are new — which is to say, a rebranding of something that already existed. This is a fundamentally different contract between input and output. And it enables things that are impossible with the tools you currently use:

Finding documents that are about the same thing, even when they use different words
Ranking results by relevance instead of binary match/no-match
Clustering data by meaning without defining the categories upfront

The distance between two vectors tells you how semantically similar the inputs are. Cosine similarity is the standard measure, and it's exactly as simple as it sounds:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Two similar sentences score close to 1.0. Two unrelated sentences score close to 0. That's the whole interface.

Under the Hood

The practical reality of using embeddings is anticlimactic. That's a feature.

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Kubernetes pod scheduling"
)

vector = response.data[0].embedding  # List of 1536 floats

That's it. You now have a point in 1536-dimensional space. Store it somewhere. When a query comes in, embed the query the same way and find the nearest stored vectors. This is the entire pattern behind every "AI-powered search" feature shipped in the last two years.

The thing that matters for systems engineers — the thing no tutorial bothers to tell you — is that embedding generation is a pure function with no side effects. Same model plus same input equals same vector, every time. Which means you can cache aggressively. You can batch. You can run it offline during a nightly job and store the results. You can put it behind the same caching layer you'd put any other deterministic computation behind.

The cost profile makes this practical. Embedding is roughly 100x cheaper than an LLM completion call and runs in single-digit milliseconds for short text. The expensive part is storing and searching millions of vectors efficiently — which is where vector databases come in. That's next week.

Decision Framework

Use embeddings when you need semantic search — when the user types "deployment issues" and should find a doc titled "CI/CD pipeline failures." Use them when you're building RAG and need to find relevant context for an LLM. Use them when you want to cluster support tickets by topic without writing a hundred regex rules.

Don't use them when exact match works. If your users search by order ID, you don't need embeddings. You need a database index. Don't use them when your data is structured and queryable — SQL didn't stop being good just because transformers exist. And don't use them when you need to explain why a result matched. Embeddings are opaque. The vector doesn't tell you which words mattered.

Concrete recommendations:

Budget-sensitive, under 1M docs: text-embedding-3-small (1536 dims). Cheap and good enough for most internal tools.
Production semantic search: text-embedding-3-large (3072 dims). Better quality, still fast.
On-premise or privacy-constrained: sentence-transformers all-MiniLM-L6-v2. Runs locally, 384 dims, no data leaves your network.

What Your Manager Thinks It Does vs. What It Actually Does

Your manager thinks embeddings are "AI that understands our data."

What embeddings actually do is convert text to numbers in a way that preserves semantic relationships. They don't understand anything. They map language to geometry. Similar meaning equals nearby points. That's the entirety of it.

The useful reframe for your next meeting: "Embeddings let us do fuzzy matching on meaning instead of exact matching on keywords. Same concept as a search index, but it catches paraphrases and related concepts that keyword search misses."

That sentence will save you twenty minutes of whiteboarding.

Ship This Weekend

Build a semantic search engine over your team's documentation. Four hours, under ten cents in API costs, zero infrastructure.

Export your docs to plain text — Confluence API, Notion export, or just copy the top 50 pages
Chunk each doc into ~500-token paragraphs
Embed each chunk with text-embedding-3-small
Store vectors in a local ChromaDB instance

import chromadb

client = chromadb.Client()
collection = client.create_collection("team-docs")

collection.add(
    documents=["chunk text here", "another chunk"],
    ids=["doc1-chunk1", "doc1-chunk2"]
)

results = collection.query(
    query_texts=["How do we deploy to staging?"],
    n_results=5
)

pip install chromadb. Zero config. No Docker. No API keys for the vector store.

When you run this against your own docs, the behavior is immediate: a query in plain English returns the exact paragraph from a runbook that answers it, even when query and answer share zero keywords. That's where embeddings click — not from reading about them, but from seeing your own data retrieved by meaning instead of by token overlap.