2026-03-25 RAG BM25 Vector Search ChromaDB LangChain Production AI ⏱ 4 min read

BM25 vs Vector Search — When and Why I Chose Each

When I built the RAG chatbot for this portfolio, I made a deliberate choice that goes against most tutorials: I used keyword search instead of a vector database. A few months later, when I built the Advanced RAG Chatbot project, I made the opposite call and reached for ChromaDB with a cross-encoder re-ranker.

Same problem domain. Different retrieval strategies. Here’s the decision logic behind each.

What Each Actually Does

BM25 is a probabilistic ranking function — the same engine behind classic search. It scores documents by how frequently your query terms appear, weighted against how rare those terms are across the whole corpus. No model, no GPU, no API call. Just counting and math.

Vector search converts text into high-dimensional embeddings, then finds documents whose vectors sit closest to your query’s vector in semantic space. It can match meaning, not just exact words — but it needs an encoder model, a vector index, and real infrastructure to serve it.

Why I Used BM25 for PRIYA (This Portfolio Chatbot)

The knowledge base here is a curated set of facts about my projects, skills, and experience — roughly 60 documents, all written to be retrieved. When someone asks “what projects has Priyabrat built?” the answer lives in a document that literally contains the word projects.

In this setting, semantic drift is the enemy. A vector model might surface a vague match about “software development” when the user wants a specific project name. BM25 finds the exact document every time.

The practical wins were decisive:

Zero infrastructure — no ChromaDB server, no embedding API calls, no added latency on every query
Fully deterministic — I can reason about exactly why a document was retrieved and debug a bad answer in seconds
Fast iteration — adding a new fact means editing a JSON file, not re-indexing embeddings
No cold-start cost — index builds in milliseconds at startup

The chatbot runs on a small Flask server. Pulling in a vector DB would have tripled the infrastructure complexity for zero measurable quality gain on a 60-document corpus.

Why I Used ChromaDB for the Advanced RAG Chatbot

The Advanced RAG Chatbot project is a different problem entirely. The knowledge base is a corpus of arbitrary documents — PDFs, markdown files, text dumps — where users ask open-ended questions that may not share vocabulary with the source material.

Here, BM25 fails predictably. A user asking “how do I handle token limits?” might need a document that talks about “context window management” — zero keyword overlap, but high semantic relevance. That’s exactly where vector search earns its place.

The retrieval pipeline in that project goes three stages:

Bi-encoder retrieval via ChromaDB with cosine similarity — fast candidate selection over the full corpus
Cross-encoder re-ranking (cross-encoder/ms-marco-MiniLM-L-6-v2) — slower but far more precise, re-scores the top-20 candidates to pick the best 5
Context injection into the LLM prompt with sliding-window conversation memory

The cross-encoder step is the one most tutorials skip. It makes a measurable difference: the bi-encoder is optimised for speed across millions of documents; the cross-encoder is optimised for accuracy on a small set of candidates. Running both in sequence gets you the best of each.

The Crossover Point

Here’s the heuristic I use when starting a new RAG project:

Under ~200 documents, well-structured — BM25. Simpler, faster, easier to debug.
200–500 documents, mixed structure — consider hybrid: BM25 + a lightweight embedding model, combine scores with Reciprocal Rank Fusion.
500+ documents, open-domain queries — vector search with re-ranking. The infrastructure overhead is worth it at this scale.
Multilingual or heavy paraphrase — vectors win regardless of corpus size. Term matching is hopeless across languages.

The Real Lesson

The best retrieval strategy is not the most sophisticated one — it’s the one that fits your data size, query patterns, and operational constraints.

I have ChromaDB in my requirements.txt for the portfolio chatbot as a reminder of the road not taken. The day the knowledge base grows beyond a few hundred documents and users start asking genuinely fuzzy questions, it earns its place. Until then, BM25 does the job with less moving parts.

Pick boring tools until the problem forces you to pick interesting ones.

Stack: Flask + Groq API (portfolio chatbot), Flask + ChromaDB + sentence-transformers (Advanced RAG). Both use SSE streaming for token-by-token responses.

💬 Comments

No comments yet — be the first!