When I built the RAG chatbot for this portfolio, I made a deliberate choice that goes against most tutorials: I used keyword search instead of a vector database. A few months later, when I built the Advanced RAG Chatbot project, I made the opposite call and reached for ChromaDB with a cross-encoder re-ranker.
Same problem domain. Different retrieval strategies. Here’s the decision logic behind each.
BM25 is a probabilistic ranking function — the same engine behind classic search. It scores documents by how frequently your query terms appear, weighted against how rare those terms are across the whole corpus. No model, no GPU, no API call. Just counting and math.
Vector search converts text into high-dimensional embeddings, then finds documents whose vectors sit closest to your query’s vector in semantic space. It can match meaning, not just exact words — but it needs an encoder model, a vector index, and real infrastructure to serve it.
The knowledge base here is a curated set of facts about my projects, skills, and experience — roughly 60 documents, all written to be retrieved. When someone asks “what projects has Priyabrat built?” the answer lives in a document that literally contains the word projects.
In this setting, semantic drift is the enemy. A vector model might surface a vague match about “software development” when the user wants a specific project name. BM25 finds the exact document every time.
The practical wins were decisive:
The chatbot runs on a small Flask server. Pulling in a vector DB would have tripled the infrastructure complexity for zero measurable quality gain on a 60-document corpus.
The Advanced RAG Chatbot project is a different problem entirely. The knowledge base is a corpus of arbitrary documents — PDFs, markdown files, text dumps — where users ask open-ended questions that may not share vocabulary with the source material.
Here, BM25 fails predictably. A user asking “how do I handle token limits?” might need a document that talks about “context window management” — zero keyword overlap, but high semantic relevance. That’s exactly where vector search earns its place.
The retrieval pipeline in that project goes three stages:
cross-encoder/ms-marco-MiniLM-L-6-v2) — slower but far more precise, re-scores the top-20 candidates to pick the best 5The cross-encoder step is the one most tutorials skip. It makes a measurable difference: the bi-encoder is optimised for speed across millions of documents; the cross-encoder is optimised for accuracy on a small set of candidates. Running both in sequence gets you the best of each.
Here’s the heuristic I use when starting a new RAG project:
The best retrieval strategy is not the most sophisticated one — it’s the one that fits your data size, query patterns, and operational constraints.
I have ChromaDB in my requirements.txt for the portfolio chatbot as a reminder of the road not taken. The day the knowledge base grows beyond a few hundred documents and users start asking genuinely fuzzy questions, it earns its place. Until then, BM25 does the job with less moving parts.
Pick boring tools until the problem forces you to pick interesting ones.
Stack: Flask + Groq API (portfolio chatbot), Flask + ChromaDB + sentence-transformers (Advanced RAG). Both use SSE streaming for token-by-token responses.