The Concept
Embeddings are fixed-length numeric vectors that represent the meaning of text, images, or other data. Every ML model that works with language uses them under the hood. When someone says "vector search" or "semantic similarity," they're talking about comparing embeddings.
You'll see them everywhere: recommendation systems, search, RAG pipelines, clustering. If you're building anything with an LLM, you'll use embeddings directly or you'll use something that depends on them.
If You Already Know Hash Functions, You Already Know Most of This
A hash function takes arbitrary input and produces a fixed-length output. MD5 gives you 128 bits. SHA-256 gives you 256 bits. The input can be anything — a string, a file, a stream of bytes. The output is always the same size.
An embedding function does the same thing. It takes text (or an image, or audio) and produces a fixed-length vector. OpenAI's text-embedding-3-small gives you 1536 floats. Cohere's embed-v3 gives you 1024. The input can be a word, a sentence, or a full document. The output is always the same dimensionality.
Here's where it maps cleanly:
| Hash function | Embedding function |
|---|---|
| Arbitrary input → fixed output | Arbitrary input → fixed output |
| Deterministic (same input → same output) | Deterministic (same input → same output) |
| Output is compact | Output is compact |
| You compare outputs to check for equality | You compare outputs to check for similarity |
That last row is where it gets interesting.
What's Actually New
Hash functions are designed so that similar inputs produce completely different outputs. Change one bit in the input and the hash is unrecognizable. That's the point — it's a security property.
Embedding functions are designed for the opposite. Similar inputs produce similar outputs. "The server crashed at 3am" and "Our backend went down overnight" should produce vectors that are close together in vector space. That's the whole value proposition.
The distance between two embedding vectors tells you how semantically similar the inputs are. Cosine similarity is the standard measure:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Two similar sentences will have similarity close to 1.0
# Two unrelated sentences will be close to 0.0
This means you can do things that are impossible with hashes:
- Find documents that are about the same thing, even if they use different words
- Rank results by relevance, not just match/no-match
- Cluster data by meaning without defining the categories upfront
Under the Hood
An embedding model is a neural network (usually a transformer) that's been trained on massive text corpora. During training, it learns to place semantically similar text close together in a high-dimensional space.
The practical workflow:
from openai import OpenAI
client = OpenAI()
# Generate an embedding
response = client.embeddings.create(
model="text-embedding-3-small",
input="Kubernetes pod scheduling"
)
vector = response.data[0].embedding # List of 1536 floats
That vector is now a point in 1536-dimensional space. Store it. When a query comes in, embed the query the same way and find the nearest stored vectors.
The key insight for systems engineers: embedding generation is a pure function with no side effects. Same model + same input = same vector. You can cache aggressively. You can batch. You can run it offline and store the results.
The cost profile: embedding is cheap (roughly 100x cheaper than an LLM completion call) and fast (single-digit milliseconds for short text). The expensive part is storing and searching millions of vectors efficiently — which is where vector databases come in. That's next week.
Decision Framework
Use embeddings when:
- You need semantic search (not just keyword matching)
- You're building a RAG pipeline and need to find relevant context
- You want to cluster or classify text without predefined rules
- Your search queries won't match the exact words in the documents
Don't use embeddings when:
- Exact match is sufficient (use a hash or a database index)
- Your data is structured and queryable (use SQL)
- You need explanations for why results matched (embeddings are opaque)
- Your corpus is under 1,000 documents and keyword search works fine
Model recommendations:
- Under 1M documents, budget-sensitive:
text-embedding-3-small(1536 dims, cheap) - Production semantic search:
text-embedding-3-large(3072 dims, better quality) - On-premise / privacy requirements: sentence-transformers
all-MiniLM-L6-v2(runs locally, 384 dims)
What Your Manager Thinks It Does vs. What It Actually Does
What your manager thinks: "Embeddings are AI that understands our data."
What it actually does: Converts text to numbers in a way that preserves semantic relationships. It doesn't understand anything. It maps language to geometry. Similar meaning = nearby points. That's it.
The useful reframe for your next meeting: "Embeddings let us do fuzzy matching on meaning instead of exact matching on keywords. Same concept as a search index, but it catches paraphrases and related concepts."
Ship This Weekend
Build a semantic search engine over your team's documentation in under 4 hours.
- Export your docs to plain text (Confluence API, Notion export, or just copy-paste the top 50 pages)
- Chunk each doc into ~500-token paragraphs
- Embed each chunk with
text-embedding-3-small - Store vectors in a local ChromaDB instance (pip install chromadb, zero config)
- Build a CLI that takes a question, embeds it, and returns the top 5 matching chunks
Total cost: under $0.10 in API calls for 50 docs. Zero infrastructure.
import chromadb
client = chromadb.Client()
collection = client.create_collection("team-docs")
# Add your chunks
collection.add(
documents=["chunk text here", "another chunk"],
ids=["doc1-chunk1", "doc1-chunk2"]
)
# Query
results = collection.query(
query_texts=["How do we deploy to staging?"],
n_results=5
)
When you demo this on Monday, your team will get it immediately. "Oh, it's like search but it actually finds what I meant." That's the moment embeddings click.
Further Reading
- OpenAI Embeddings Guide — The canonical reference. Read the "Use cases" section.
- Sentence Transformers Documentation — Best open-source embedding models. Essential if you can't send data to an API.
- What Are Embeddings? (Vicki Boykis) — The deepest technical explanation available. Read this if you want to understand the math.
- ChromaDB Getting Started — Zero-config vector store for prototyping. You'll outgrow it, but it's the fastest way to start.