Semantic Search is a search method that retrieves content based on meaning rather than exact keyword matches, typically powered by vector embeddings and similarity scoring.
What semantic search actually means
Semantic search is the technique of finding documents whose meaning matches a query, rather than whose words match. The query “ways to reduce model hallucination” should return documents about grounding, retrieval-augmented generation, and factuality benchmarks, even when those documents never use the word “hallucination”.
This sounds obvious but it’s a real shift from how search worked for the previous 25 years.
How it differs from keyword search
Keyword search looks for literal token matches. Sometimes with stemming (so “running” matches “run”), sometimes with synonyms (so “car” matches “automobile” if a synonym list is configured). The system has no understanding of meaning. It’s pattern matching on text.
Semantic search converts both query and documents into vector embeddings. Embeddings are numerical representations where similar meanings produce similar vectors. The system retrieves the documents whose vectors are closest to the query vector using a similarity measure.
The result: documents come back even when no shared words exist between query and text. That’s the whole point.
How it works under the hood
A semantic search system has three components.
Embedding model. A neural network that turns text into vectors. Modern embedding models (OpenAI text-embedding-3, Cohere embed-v3, BGE, e5, Nomic) produce vectors in 384 to 4096 dimensions. The model is trained so that text with similar meaning ends up with similar vectors.
Vector store. A database optimised for similarity search at scale. Specialised options include Pinecone, Weaviate, Qdrant, and pgvector. The vector store handles indexing and approximate nearest-neighbour search efficiently.
Query pipeline. When a query comes in, it gets embedded with the same model used at indexing time. The vector store returns the top K most similar document vectors. The system retrieves the original text and returns it to the caller.
This is the foundation of retrieval-augmented generation, where retrieved chunks become context for an AI model’s answer.
Where keyword search still wins
Semantic search isn’t always the right tool. A few cases where keyword search beats it:
- Exact match queries. Product SKUs, error codes, specific filenames. You don’t want fuzzy semantic matching here. You want exact.
- Rare technical terms. If a document mentions an obscure library by name, semantic search may not surface it for a query that uses that exact name. Keyword search catches it instantly.
- Latency-sensitive paths. Keyword search returns in microseconds. Semantic search needs to embed the query and search the vector store, which is slower and more expensive.
Most production systems use hybrid search: run both in parallel, combine results with a reranker. This usually beats either approach alone.
The personal-search use case
For most people, semantic search shows up in two places now.
First, AI assistants. When you ask Claude or ChatGPT a question, they often retrieve relevant context from a knowledge base via semantic search before answering.
Second, personal knowledge tools. ContextBolt uses semantic search across your saved bookmarks. When you ask “what did I save about React state management”, the system isn’t matching the literal phrase. It’s matching the meaning of your query against the meaning of every bookmark you’ve saved. That’s why it works for queries you’d never have known to save under that exact wording.
The same approach extends to anything you save and want to retrieve later: notes, papers, threads, articles. Semantic search makes archives searchable by intent rather than by remembered keywords.
Limitations to know about
Semantic search has known weaknesses.
It hallucinates relevance. A document can come back as “semantically similar” while being practically useless. This happens when the embedding model has weak training on your domain.
It’s only as good as the chunking. Long documents get split into chunks before embedding. Bad chunking (cutting mid-paragraph, losing context) degrades retrieval quality. Most engineering effort in production semantic search goes into chunking strategy.
It struggles with negation and precise constraints. Queries like “show me articles that DON’T mention React” don’t translate well to vector similarity. For those, structured filters and keyword exclusions help.
These aren’t reasons to avoid semantic search. They’re reasons to combine it with other techniques and to evaluate retrieval quality on your specific data.