Vector Embeddings is numerical representations of text, images, or other data that capture meaning, so similar content ends up with similar vectors and can be matched by concept rather than exact words.
What an embedding is
A vector embedding is a way of turning a piece of content into a list of numbers that captures its meaning. Feed a sentence, paragraph, or document to an embedding model and it returns a vector, typically a few hundred to a few thousand numbers long.
The magic is in what those numbers represent. Content with similar meaning produces similar vectors, even if the exact words are completely different. “How do I speed up my database?” and “My SQL queries are running slow” share almost no words but would produce embeddings that sit close together in vector space.
This is the foundation of modern semantic search, RAG, and most AI retrieval systems.
Why this matters
Traditional search is literal. You type a word, the system finds documents containing that word. If you do not remember the exact phrasing, you are stuck.
Embeddings let you search by concept. You describe what you are looking for in your own words, and the system finds content that means the same thing, regardless of vocabulary. This is especially powerful for personal collections. You saved something weeks ago. You do not remember the words. You remember the idea. Embeddings bridge that gap.
For a bookmark manager like ContextBolt, this is the difference between useful and useless at scale. With 1,000 saves, keyword search fails constantly. With semantic search over embeddings, you can describe what you remember and the system does the matching.
How embeddings are created
Embedding models are neural networks trained to place similar content close together in vector space. The training objective is usually contrastive: given pairs of similar items, pull their vectors together; given pairs of different items, push them apart. After enough training, the model learns a geometry where distance equals semantic distance.
Popular embedding models as of 2026:
- OpenAI text-embedding-3-small and text-embedding-3-large: widely used, good quality, paid API
- Cohere embed-v3: strong multilingual support
- BGE and E5 families: open-source, competitive quality, run locally
- Voyage AI: specialised for retrieval, often used in production RAG
Choice of model matters. A better embedding model means better retrieval, which means better final answers from whatever AI system sits on top.
Cosine similarity and nearest neighbours
Once you have embeddings, you need to find which ones are close together. The standard metric is cosine similarity, which measures the angle between two vectors. Values near 1 mean highly similar, values near 0 mean unrelated.
For small collections you can compute similarity directly. For larger ones, you need a vector database. These databases are optimised for approximate nearest-neighbour search, finding the top-k closest vectors to a query vector in milliseconds even across millions of entries. Without them, retrieval at scale would be impossibly slow.
Embeddings in a personal knowledge system
The typical flow for something like semantic bookmarking:
- You save content. The system extracts the text.
- Text gets split into chunks and passed through an embedding model.
- The resulting vectors are stored alongside the original content.
- When you query, your query gets embedded too.
- The system finds the chunks whose vectors are closest to your query vector.
- Those chunks get returned, usually as context for an AI to summarise or reason over.
Every step is invisible to the user. You save things. You ask questions. The embeddings do the work in between.
ContextBolt uses exactly this pattern under the hood, with the added twist that the retrieval step is exposed to AI assistants through the Model Context Protocol. The AI does not need to know embeddings exist. It just gets a search tool that works on meaning.