AI, Software Development

Semantic Search: How It Works

By James KillickNovember 27, 2024

TL;DR: Semantic search finds results based on what a user means, not just the words they type. It uses vector embeddings to match meaning across text. If your app still runs keyword search, it is returning worse results than it should.

Semantic search finds results based on meaning. Type 'I need help with my bill' and a keyword search looks for the words 'help', 'bill'. A semantic search understands you might mean a utility invoice, a government payment, or a phone plan. It matches what you meant.

That difference matters a lot once real users touch your product.

What is the actual difference between keyword search and semantic search?

Keyword search is pattern matching. It scans text for the exact words in the query and returns documents that contain them. Fast, simple, and widely used. Also brittle.

If a user types 'fix login problem' and your docs say 'resolve authentication error', keyword search returns nothing. The meaning is identical. The words are not.

Semantic search converts both the query and the documents into vectors. A vector is a list of numbers that represents the meaning of text in a high-dimensional space. Similar meanings land close together in that space. So 'fix login problem' and 'resolve authentication error' end up near each other, and the right result comes back.

The model doing the conversion is called an embedding model. OpenAI, Cohere, and open-source options like Sentence Transformers all produce them. You pick one, run your content through it, store the vectors, and query against them at search time.

How do vector embeddings work?

An embedding model is a neural network trained on large amounts of text. During training it learns which words, phrases, and sentences tend to appear in similar contexts. That context knowledge gets compressed into a fixed-length vector, typically 768 to 1536 numbers.

Words that mean similar things end up with vectors that point in similar directions. 'Dog' and 'puppy' will be close. 'Dog' and 'invoice' will be far apart.

When a user runs a search, their query gets converted to a vector in real time. You then do a similarity search across your stored vectors. Cosine similarity is the most common measure. You get back the top N results ranked by how close their vectors are to the query vector.

The whole process takes milliseconds once your index is built. The index is the stored set of vectors for all your content. Tools like Pinecone, Weaviate, Qdrant, and pgvector handle that storage and retrieval efficiently.

What does it take to add semantic search to an existing app?

If you already have content stored in a database, the steps are straightforward.

Pick an embedding model. OpenAI's `text-embedding-3-small` is a solid starting point. It is cheap and accurate.
Run your existing content through the model to generate vectors.
Store those vectors in a vector database or a vector column in your existing Postgres database (pgvector).
At query time, embed the user's search input and run a similarity search against your stored vectors.
Return the matched records.

For most apps, this is a backend addition. You are not rewriting your data model. You are adding a vector representation alongside what you already have.

The main decision is where to store the vectors. If you are already on Postgres, pgvector is the lowest-friction path. If you need scale or hybrid search features, a dedicated vector database is worth the extra setup.

Our guide on adding AI to your existing app or software covers the full integration pattern, including how semantic search fits into a broader AI feature rollout.

When should a CTO actually prioritise this?

Not every app needs semantic search. But there are clear signals that tell you when it is worth building.

Your users are searching and leaving. High search volume with low conversion to results is a strong signal your search is failing. Users are asking questions your system cannot answer.

Your content is unstructured. Docs, support tickets, knowledge bases, product descriptions. Keyword search struggles with varied phrasing. Semantic search handles it.

You have multilingual users. Embedding models handle multiple languages well. You can search English content with a Spanish query and get accurate results without any translation layer.

You are building AI features anyway. Once you have embeddings and a vector store in place, retrieval-augmented generation (RAG) becomes much easier to add. The infrastructure overlaps significantly.

If you are weighing this decision for your platform, the technology considerations for CTOs section of the Devwiz site breaks down how to evaluate AI additions against engineering capacity.

What are the real costs and performance trade-offs?

Semantic search is not free, but the costs are manageable.

Embedding cost. Running content through an embedding model costs money. For OpenAI's `text-embedding-3-small`, the rate is around $0.02 per million tokens. A medium-sized knowledge base of 50,000 documents typically costs a few dollars to index. Re-embedding on content updates is the ongoing cost.

Latency. Embedding a query at search time adds a few milliseconds. Vector similarity search on a well-indexed store returns in under 100ms at most scales. End users do not notice.

Accuracy ceiling. Semantic search is better than keyword search for natural language queries. It is not perfect. Domain-specific jargon can trip up general-purpose models. Fine-tuned or domain-specific embedding models solve this at higher cost.

Hybrid search. Many production systems combine semantic and keyword search. You run both, then merge the results using a ranking algorithm like Reciprocal Rank Fusion. This gets you the best of both: exact match for product codes and SKUs, meaning match for natural language queries.

How Devwiz approaches semantic search builds

Devwiz has been building apps since 2015, with 200+ shipped across clients including NSW Government, Briometrix, Vivid, and Huskee. Semantic search has become a standard component in AI-enabled products, whether that is an internal knowledge tool, a customer-facing support assistant, or a product recommendation engine.

The implementation pattern is consistent. Evaluate the embedding model against your content type. Build the indexing pipeline so it runs on content updates, not just at setup. Add hybrid search if the content mix includes structured identifiers. Instrument the search to track query quality from day one.

James Killick, who leads product and AI strategy at Devwiz, writes about the practical side of AI integration at jameskillick.co. The focus is always on what gets shipped and what actually improves product metrics.

If you want semantic search built into your product properly, the Devwiz AI app development service is the right starting point. We scope the integration, pick the right stack for your data volume, and build it into your existing system without a rewrite.

---

FAQ

Q: Does semantic search replace keyword search completely?

A: Not in most production systems. Keyword search is still better for exact matches like product codes, names, and IDs. The standard approach is hybrid search: run both, then combine the results using a ranking function. You get accurate exact matches and natural language understanding in the same query.

Q: How accurate is semantic search compared to keyword search?

A: For natural language queries, semantic search is significantly more accurate. Studies on retrieval benchmarks consistently show 15-40% improvement in recall over keyword-only search. The gap is largest when users phrase things differently to how the content is written, which is most of the time.

Q: Can I add semantic search without switching databases?

A: Yes. If you run Postgres, the pgvector extension adds vector storage and similarity search directly. You keep your existing data model and add a vector column. No separate database required. For higher scale or more advanced filtering, a dedicated vector database like Pinecone or Qdrant is worth considering.

Q: How do I keep the search index up to date as content changes?

A: You build an indexing pipeline that runs on content events: create, update, delete. When a document changes, re-embed it and update the vector store. Most teams trigger this via a background job or a webhook from the content system. The embedding step is fast enough that near-real-time indexing is practical.

Q: What embedding model should I start with?

A: For most apps, OpenAI's `text-embedding-3-small` is the right starting point. It is accurate, cheap, and well-supported by every major vector database. If you have specific domain vocabulary (medical, legal, financial) or need to run models on your own infrastructure for data privacy reasons, open-source alternatives like `all-MiniLM-L6-v2` from Sentence Transformers are a solid option.

Frequently asked questions

Does semantic search replace keyword search completely?

Not in most production systems. Keyword search is still better for exact matches like product codes, names, and IDs. The standard approach is hybrid search: run both, then combine the results using a ranking function. You get accurate exact matches and natural language understanding in the same query.

How accurate is semantic search compared to keyword search?

For natural language queries, semantic search is significantly more accurate. Studies on retrieval benchmarks consistently show 15-40% improvement in recall over keyword-only search. The gap is largest when users phrase things differently to how the content is written, which is most of the time.

Can I add semantic search without switching databases?

Yes. If you run Postgres, the pgvector extension adds vector storage and similarity search directly. You keep your existing data model and add a vector column. No separate database required. For higher scale or more advanced filtering, a dedicated vector database like Pinecone or Qdrant is worth considering.

How do I keep the search index up to date as content changes?

You build an indexing pipeline that runs on content events: create, update, delete. When a document changes, re-embed it and update the vector store. Most teams trigger this via a background job or a webhook from the content system. The embedding step is fast enough that near-real-time indexing is practical.

What embedding model should I start with?

For most apps, OpenAI's text-embedding-3-small is the right starting point. It is accurate, cheap, and well-supported by every major vector database. If you have specific domain vocabulary or need to run models on your own infrastructure for data privacy reasons, open-source alternatives like all-MiniLM-L6-v2 from Sentence Transformers are a solid option.

About James Killick

James is a co-founder of Devwiz and an AI product specialist. Since 2015 he has helped ship 200+ apps for founders, businesses and government, including work for NSW Government, Briometrix and Huskee. He builds AI-first platforms and writes about turning a proven program into software. He also hosts the Up in the AI podcast.

jameskillick.co · LinkedIn · AI Orchestrators

Tags: AI Integration