AI, Software Development

Embeddings Explained Simply

By James KillickJanuary 16, 2025

TL;DR: An embedding is a list of numbers that tells a computer what something means, not just what it says. That shift from keywords to meaning is what makes modern AI search, recommendations, and chatbots actually useful. Once you understand embeddings, you understand the engine behind most AI features worth building.

An embedding is a list of numbers. That list tells a computer what a piece of text (or an image, or audio) *means*, not just what it literally says. That one idea is the engine behind AI search, smart recommendations, and most of what makes a chatbot feel like it actually understands you.

You do not need a maths degree to use them. You just need to know what they do and when to reach for them.

What does 'embedding' actually mean?

Imagine you want to compare two sentences:

'How do I reset my password?'
'I forgot my login details.'

A keyword search sees zero overlap. An embedding model reads both sentences, converts each one into a list of numbers, and places those numbers close together in space because they mean the same thing.

That's the whole idea. Words and phrases that mean similar things get similar numbers. Words that mean different things get numbers that are far apart.

OpenAI's embeddings guide goes deeper on the maths if you want it. For most builders, the concept above is enough to get started.

Why does meaning-as-numbers matter?

Because computers can compare numbers. They cannot compare meaning directly.

Once you convert text into numbers, you can:

Find the most relevant answer to a question (semantic search)
Group similar support tickets together automatically
Recommend products that match what a user actually wants
Build a chatbot that searches your own documents instead of hallucinating

None of that works well with keyword matching alone. Embeddings are what make it work.

How do you actually create an embedding?

You send text to an embedding model and it returns a list of numbers. That's it from a code perspective.

Here's the rough flow:

You have a piece of text ('How do I reset my password?')
You send it to an embedding model (like OpenAI's `text-embedding-3-small`)
The model returns a vector, a list of maybe 1,536 numbers
You store that vector in a vector database (Pinecone, pgvector, Weaviate)
At query time, you convert the user's question to a vector and search for the closest match

The 'closest match' step uses a calculation called cosine similarity. You do not need to write it yourself. Every vector database handles it.

What problems do embeddings actually solve?

Think about any search or matching problem in your app:

Support chatbots: User asks a question. You search your knowledge base with an embedding, pull the right answer, and feed it to the language model. The bot answers from your data, not from guesswork.
Document search: Legal, medical, or internal docs. Embeddings let staff search by intent rather than exact phrase.
Recommendations: A user reads an article about Python. You find articles with similar embeddings. You surface them without needing tags or categories.
Duplicate detection: Two support tickets mean the same thing but are worded differently. Embeddings catch that.

If you are adding AI to an existing app, embeddings are often the right starting point. Our guide on how to add AI to your existing app or software walks through where they fit in a real integration.

What is a vector database and do you need one?

A vector database stores your embeddings and lets you search them fast.

For small sets of data, maybe a few hundred records, you can store vectors in a regular database and do the similarity maths yourself. It works fine.

For anything bigger, you want a dedicated vector store. Pinecone, Qdrant, and pgvector (a PostgreSQL extension) are the most common choices. They handle indexing and similarity search at scale without you writing the maths.

You do not need to pick one on day one. Start with what you have. Migrate when the volume demands it.

What are the limits of embeddings?

Embeddings are not magic. A few things to know:

They are only as good as the model that creates them. A cheap model will produce weaker embeddings than a better one.
They work on meaning, not facts. They will surface relevant text even if that text is wrong. You still need to verify the source.
They add latency and cost. Every query that uses semantic search hits an embedding model and a vector database. That's two extra round trips.
They need maintenance. If your data changes, your stored embeddings need to be regenerated.

For most production apps, these are manageable trade-offs. But go in with clear expectations.

What should a CTO or tech lead know before building with embeddings?

A few decisions matter early:

Which embedding model? OpenAI's `text-embedding-3-small` is cheap and solid. For on-premise or privacy-sensitive work, open-source models like Nomic or sentence-transformers run locally.
Which vector store? pgvector is a good default if you are already on Postgres. Purpose-built stores like Pinecone or Qdrant suit high-volume workloads.
How do you chunk your data? Long documents need to be split into smaller pieces before embedding. The chunk size affects retrieval quality. This is where most teams make mistakes early.
How do you keep embeddings fresh? Build a pipeline that re-embeds content when it changes. A stale vector database quietly returns wrong answers.

If you are leading a team that is new to this, check the resources for CTOs on how Devwiz approaches AI integration scoping.

Real-world embeddings in production

Devwiz has shipped AI features across 200+ apps since 2015, including work for NSW Government, Briometrix, Vivid, and Huskee. Embeddings show up in a lot of them, usually in one of three forms: semantic search over internal documents, AI-assisted support tooling, or recommendation layers on top of existing content.

The pattern is consistent: start with a clear retrieval problem, pick a proven embedding model, use a vector store that fits the data volume, and test retrieval quality with real user queries before launch.

For a broader look at what the AI integration space looks like in practice, the independent research at AILED tracks what teams are actually shipping, which is useful context before you commit to a stack.

---

Want to build a feature that uses embeddings? The Devwiz AI app development team can scope it, build it, and ship it. Get in touch.

---

FAQ

What is an embedding in simple terms?

An embedding is a list of numbers that represents the meaning of a piece of text. Words or sentences with similar meanings get similar numbers. That lets a computer compare meaning, not just keywords, which is the basis for AI search and most recommendation systems.

Do I need to understand the maths to use embeddings?

No. You call an API, get a list of numbers back, and store it. The similarity calculations happen inside the vector database. Most developers who ship embedding-based features never write cosine similarity by hand.

What is the difference between an embedding and a vector?

They are the same thing used in different contexts. 'Embedding' usually refers to the process of converting text into numbers. 'Vector' usually refers to the list of numbers itself. You will see both terms in the same documentation.

How much does it cost to use embeddings in production?

OpenAI's `text-embedding-3-small` costs roughly $0.02 per million tokens, which is very cheap. The main cost at scale is the vector database, which ranges from free (pgvector, self-hosted) to a few hundred dollars a month for managed services handling millions of records.

When should I use embeddings instead of a regular keyword search?

Use embeddings when users search by intent rather than exact phrase, when your content is unstructured or varies in wording, or when you need to match things that mean the same thing but are written differently. Keyword search is still faster and cheaper for exact lookups, so use both where it makes sense.

Frequently asked questions

What is an embedding in simple terms?

Do I need to understand the maths to use embeddings?

What is the difference between an embedding and a vector?

How much does it cost to use embeddings in production?

OpenAI's text-embedding-3-small costs roughly $0.02 per million tokens, which is very cheap. The main cost at scale is the vector database, which ranges from free (pgvector, self-hosted) to a few hundred dollars a month for managed services handling millions of records.

When should I use embeddings instead of a regular keyword search?

About James Killick

James is a co-founder of Devwiz and an AI product specialist. Since 2015 he has helped ship 200+ apps for founders, businesses and government, including work for NSW Government, Briometrix and Huskee. He builds AI-first platforms and writes about turning a proven program into software. He also hosts the Up in the AI podcast.

jameskillick.co · LinkedIn · AI Orchestrators

Tags: AI Integration