AI, Software Development

Vector Databases Explained

By James KillickOctober 9, 2024
Vector Databases Explained

TL;DR: A vector database stores data as numbers that represent meaning, so your app can find similar things rather than exact matches. This is what powers semantic search, AI chat with memory, and recommendation engines. If you are adding AI to an existing app, you almost certainly need one.

A vector database stores data as mathematical representations of meaning. Instead of matching keywords, it finds things that are conceptually similar. That is what makes AI-powered search, chat memory, and recommendations actually work.

If you are building AI features into a product, you will hit this concept early. Here is what you need to know.

What does a vector database actually do?

Every piece of content, a support ticket, a product description, a user message, gets converted into a list of numbers called a vector or embedding. Similar content produces similar numbers.

When a user searches or asks a question, that query also gets converted. The database then finds stored vectors that are closest to the query vector. That is called similarity search.

A standard SQL database cannot do this. It is built for exact lookups: find the row where `id = 42`. A vector database is built for approximate matches: find the 10 rows most similar to this query.

Tools like Pinecone, Weaviate, and pgvector (a Postgres extension) handle the storage and search side. You still need a model to create the embeddings in the first place.

Why does this matter for AI apps?

Large language models (LLMs) like GPT have a context window. They can only read so much text at once. If your product has thousands of documents, you cannot stuff them all into a prompt.

Vector databases solve this. You store all your content as embeddings. At query time, you pull only the most relevant chunks and pass those to the model. This pattern is called Retrieval-Augmented Generation, or RAG.

RAG is how you build:

  • AI chat that knows your product docs
  • Search that understands intent, not just keywords
  • Recommendation engines that go beyond category tags
  • Support tools that surface the right answer from a large knowledge base

Without a vector database, your AI app either has a tiny memory or a bloated, expensive prompt.

How do embeddings get created?

An embedding model reads your content and outputs a vector. OpenAI's `text-embedding-3-small` is a common choice. You send text in, you get a list of numbers back.

You run this process once for all your existing content, then again each time new content is added. Those vectors go into your vector database.

At query time:

  1. The user's question gets embedded using the same model
  2. The database finds the closest stored vectors
  3. The matching content gets retrieved
  4. That content goes into the LLM prompt as context
  5. The model generates a response grounded in your actual data

This is the core loop for most RAG-based AI features. The quality of your embeddings and how you chunk your content both affect how well it works.

When do you actually need one?

You need a vector database when your AI feature depends on finding relevant content from a large body of data. Some clear signals:

  • You want AI chat that references your own documents or knowledge base
  • You are building semantic search (search by meaning, not exact words)
  • You need personalised recommendations based on user behaviour or preferences
  • You are storing user conversation history and need to retrieve relevant past context
  • Your dataset is too large to fit in a single prompt

If you are only calling an LLM with a fixed prompt and no dynamic content retrieval, you can skip it for now. But most production AI features eventually need one.

For CTOs thinking through the full architecture, the AI integration planning guide for technical leaders covers how these components fit into your existing stack.

How does a vector database fit into an existing app?

Most apps already have a primary database, Postgres, MySQL, MongoDB. The vector database sits alongside it, not instead of it.

Your existing database still stores structured data: users, orders, settings. The vector database stores embeddings of content that needs to be searched semantically.

The integration pattern looks like this:

  • Content is written to both databases (structured data to SQL, embeddings to the vector store)
  • User queries hit the vector database first to retrieve relevant content
  • That content is passed to the LLM along with the user query
  • The response is returned to the user

If you are working with an existing codebase, adding AI features to an existing app walks through this kind of incremental approach in more detail.

Pgvector is worth considering if you are already on Postgres. It adds vector search as a native extension, which reduces infrastructure complexity. For high-scale or production-grade workloads, a dedicated vector database like Pinecone gives you more control over indexing and performance.

What should CTOs watch out for?

A few things trip teams up early:

Chunking strategy matters. How you split documents into chunks before embedding affects retrieval quality significantly. Too large and the retrieved context is noisy. Too small and you lose important surrounding meaning.

Embedding model consistency. You must use the same model to create embeddings and to embed queries at search time. Switching models later means re-embedding all your data.

Cost at scale. Embedding generation and vector storage are cheap at small scale. As your dataset grows, the costs and latency of re-embedding or querying thousands of vectors add up. Plan for this early.

Staleness. If your source content changes, the stored embeddings need to be updated. You need a process to keep them in sync.

Teams building AI programs for clients use frameworks like Njin to structure this kind of technical decision-making, so the AI features they ship are grounded in real business logic, not just demos.

What does this look like in a real product?

At Devwiz, we have been building AI platforms and programs since before the current wave. Across 200+ apps since 2015, the teams that get the most out of AI are the ones who treat data architecture as seriously as model selection.

For Briometrix, that meant building a search layer that could find biomechanical content by conceptual similarity, not just keyword overlap. For clients in the NSW Government, it meant a retrieval layer that could surface relevant policy sections from large document sets without hallucinating answers.

The vector database is the infrastructure layer that makes this possible. Get it right and your AI feature works. Get it wrong and you get confident but wrong answers, slow searches, or a model with no useful memory.

If you are working through whether this fits your product, the AI app development service page covers how we approach this.

---

Ready to build an AI feature that actually works at scale? Talk to the team at Devwiz.

Frequently asked questions

What is a vector database in simple terms?

A vector database stores data as lists of numbers that represent meaning. When you search, it finds stored items with similar numbers rather than matching exact words. This is what powers AI search, chat memory, and recommendations. Think of it as a database that understands concepts rather than just keywords.

Do I need a vector database to use an LLM like GPT-4?

Not always. If you are passing a fixed prompt to an LLM, you do not need one. But if your app needs to retrieve relevant information from a large set of documents, user history, or product data before generating a response, a vector database is how you do that efficiently. Most production AI features end up needing one.

What is the difference between a vector database and a regular database?

A regular database finds exact matches: rows where a field equals a specific value. A vector database finds approximate matches: stored items that are most similar to a query. They solve different problems and usually sit alongside each other in a production stack. You keep your structured data in SQL and your embeddings in the vector store.

Which vector database should I use?

If you are already on Postgres, pgvector is the simplest starting point. It adds vector search as an extension with no new infrastructure. For higher scale or more complex retrieval needs, Pinecone and Weaviate are purpose-built and offer more control. The right choice depends on your data volume, query patterns, and team familiarity with the tooling.

How much does it cost to add a vector database to an app?

At small scale, costs are low. Embedding generation via OpenAI costs fractions of a cent per document. Pgvector is free if you are already on Postgres. Dedicated vector databases like Pinecone start free and scale with usage. For most early-stage AI features, the cost is not a blocker. At large scale, re-embedding and query costs need to be planned for.

About James Killick

James is a co-founder of Devwiz and an AI product specialist. Since 2015 he has helped ship 200+ apps for founders, businesses and government, including work for NSW Government, Briometrix and Huskee. He builds AI-first platforms and writes about turning a proven program into software. He also hosts the Up in the AI podcast.

jameskillick.co · LinkedIn · AI Orchestrators

Tags: AI Integration