System Architecture — Priyabrat Dalbehera

RAG Pipeline

The chatbot uses a custom BM25-based Retrieval-Augmented Generation pipeline. Instead of heavy vector embeddings, it uses lightweight keyword scoring (rank-bm25) over structured JSON knowledge items, giving fast and accurate retrieval with zero GPU overhead.

💬 User Query

🔍 Off-Topic Detection

if off-topic

🚫 Blocked Reply (streamed)

✓ allowed

⚙️ Query Expansion

synonyms · tech aliases · tag names

Knowledge Sources

📁 projects.json 🕐 timeline.json 📝 notes.json 📚 KNOWLEDGE[ ]

📊 BM25 Retrieval rank-bm25 · top_k = 8

📎 Context Assembly + Relevant Links

📋 System Prompt + Conversation History + Retrieved Context

🚀 Groq LPU Llama 3.3-70B / 3.1-8B

or

🧠 OpenAI GPT-4o / GPT-4o-mini

✨ Streaming Response → Browser

📊 BM25 Retrieval

Uses Okapi BM25 over tokenized knowledge items. No model download, no GPU — pure term-frequency scoring. Expanded query improves recall for synonyms.

from rank_bm25 import BM25Okapi

bm25 = BM25Okapi(tokenized_corpus)
scores = bm25.get_scores(_tokenize(query))
top_idx = scores.argsort()[::-1][:8]

⚙️ Query Expansion

Before retrieval the query is expanded with domain synonyms. Contact queries expand with ['email','linkedin','github']; project queries expand with project titles and tag aliases.

# Tech alias expansion
tech_aliases = {
  'langchain': ['rag', 'chain', 'retrieval'],
  'crewai':    ['multi-agent', 'crew'],
}
for key, aliases in tech_aliases.items():
    if key in q_lower: expansion += aliases

🔍 Off-Topic Detection

A whitelist approach checks against PORTFOLIO_WORDS and CODING_WORDS lists. Short follow-ups (≤10 words) in ongoing conversations always pass through.

def is_off_topic(query, has_history):
    q = query.lower()
    if any(w in q for w in PORTFOLIO_WORDS):
        return False
    if has_history and len(q.split()) <= 10:
        return False
    return True

📋 Context Assembly

Top-8 BM25 results are assembled as bullet points. The system prompt is built dynamically: base persona + projects section + timeline section + retrieved context + links.

system_prompt = (
    SYSTEM_BASE
    + _build_projects_section(projects)
    + _build_timeline_section(timeline)
    + f"\n\nRetrieved:\n{context}"
    + links_note
)

Streaming SSE

Chat responses stream token-by-token using HTTP Server-Sent Events. The browser sends one POST request; Flask keeps the connection alive and pushes each LLM token as a text/event-stream chunk via stream_with_context. Messages are saved to SQLite only after the full response is complete.

🖥 Browser

POST /api/chat/stream

EventSource connected

Append token → DOM

done → stop animation

Store session_id

POST {message, model, session_id}

200 text/event-stream

data: {token: "Hello"}

data: {token: " world"}

data: {token: "!"}

data: {done: true, session_id}

⚙️ Flask Server

Rate limit check

Load history from SQLite

build_rag_messages()

LLM stream=True open

yield {token: chunk}

Save messages to SQLite

log_usage() · yield done

🌊 Flask SSE Route

Uses Response(stream_with_context(generate()), content_type='text/event-stream'). The inner generator yields JSON-encoded token chunks as SSE events.

def generate():
    stream = groq.create(..., stream=True)
    for chunk in stream:
        content = chunk.choices[0].delta.content
        yield f"data: {json.dumps({'token':content})}\n\n"
    yield f"data: {json.dumps({'done':True})}\n\n"

return Response(
    stream_with_context(generate()),
    content_type='text/event-stream',
    headers={'Cache-Control': 'no-cache',
             'X-Accel-Buffering': 'no'})

📡 Client-Side Fetch Stream

The browser uses the Fetch API with a ReadableStream reader rather than EventSource, allowing POST requests with JSON bodies. Tokens are parsed and appended in real-time.

const resp = await fetch('/api/chat/stream', {
  method: 'POST',
  body: JSON.stringify({message, model, session_id})
});
const reader = resp.body.getReader();
while (true) {
  const {done, value} = await reader.read();
  if (done) break;
  // parse SSE lines, extract .token, append
}

💾 Post-Stream Persistence

Chunks are accumulated during streaming. After done, the full assembled text is saved to SQLite, token usage is logged (with estimated fallback if the API doesn't return usage), and the session ID is emitted.

# After streaming loop ends:
full_text = ''.join(chunks)
_add(session_id, 'user',      user_msg)
_add(session_id, 'assistant', full_text)
log_usage(model, provider,
           tokens_in, tokens_out, latency_ms)

⚠️ Off-Topic Streaming

Even blocked responses stream word-by-word for consistent UX. Off-topic detection runs before the LLM call; the canned reply is split on spaces and yielded as token events.

if is_off_topic(message, has_history):
    def _off():
        for word in OFF_TOPIC_REPLY.split(' '):
            yield (
                f"data: "
                + json.dumps({'token': word+' '})
                + "\n\n"
            )
        yield f"data: {json.dumps({'done':True})}\n\n"
    return Response(stream_with_context(_off()), ...)

Agent Orchestration

A real multi-step Research Agent runs live on this portfolio at /agent ↗. It implements the ReAct pattern with a custom StateGraph — three typed nodes connected by a conditional loop edge — mirroring the LangGraph architecture exactly, with no library overhead for lightweight deployment.

LangGraph · StateGraph Pattern

📝 Task Input / User Prompt

🔷 StateGraph — AgentState (TypedDict)

⚡ Conditional Router / Edge Function

🔍 Research Node

Web Search / RAG

state.research ✓

🧮 Analysis Node

Code Executor

state.analysis ✓

✍️ Writer Node

Formatter / Template

state.draft ✓

🔗 State Aggregator + Checkpointer

✅ Final Output / END Node

CrewAI · Sequential Multi-Agent Pattern

👔 Researcher

gather information

📊 Analyst

process & synthesise

✍️ Writer

produce output

📄 Structured Report

final deliverable

Sequential execution · Each agent sees prior agent output · Role + Goal + Backstory defined per agent

🔷 LangGraph StateGraph

A typed TypedDict state flows through nodes. Conditional edges route execution based on state values. Checkpointers enable pause-and-resume and human-in-the-loop.

from langgraph.graph import StateGraph

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("tools",    ToolNode(tools))
graph.add_conditional_edges(
    "router", route_fn,
    {"research": "research",
     "analysis": "analysis"})

🔧 Tool Calling

Each node binds tools to the LLM with .bind_tools(). The model returns tool_calls in its message; the graph routes to a ToolNode, executes, and feeds the result back into state.

llm_with_tools = llm.bind_tools([
    web_search,
    code_executor,
    retriever,
])

# ToolNode handles execution automatically
tool_node = ToolNode(tools=[
    web_search, code_executor, retriever
])

🤝 CrewAI Agents

Agents are defined with role, goal, and backstory. Tasks are assigned per-agent and executed sequentially. The crew orchestrator manages handoffs and shared context.

researcher = Agent(
  role="Research Analyst",
  goal="Find accurate information",
  backstory="Expert at gathering data",
  tools=[search_tool])

crew = Crew(
  agents=[researcher, analyst, writer],
  tasks=[task1, task2, task3],
  process=Process.sequential)

🔄 ReAct Loop

All agents follow Reason → Act → Observe. The LLM reasons about the goal, picks a tool action, receives the observation, and loops until it decides the task is complete.

# ReAct cycle (conceptual)
while not done:
    thought = llm.think(state, tools)
    if thought.is_final_answer:
        break
    obs = thought.tool.invoke(thought.args)
    state.update(observation=obs)
    # loop: reason again with observation

This agent runs live on this portfolio

Try it — ask about projects, skills, or any AI/ML concept. Watch the Plan → Research → Synthesize steps execute in real-time.

▶ Try the Agent →

My Projects

Architecture diagrams auto-generated from each project's GitHub README. Add a GitHub URL to any project in the Admin Panel, then click the 🏗 Arch button to generate its diagram.

Project Architecture

🔗 Advanced RAG Chatbot

The Advance_Rag_Chatbot system is a production-ready, full-stack RAG chatbot built with Python, Flask, and ChromaDB. It uses a bi-encoder for retrieval, a cross-encoder for re-ranking, and a large language model for generation. The system has a sliding-window session memory for multi-turn coherence and supports evaluation metrics like faithfulness, relevancy, precision, and recall.

GitHub ↗

RAG Cross-encoder re-ranking Sentence-transformers ChromaDB Vector search Flask Session management RAGAS evaluation Faithfulness scoring OpenAI · Ollama

OpenAI GPT or Ollama

LLM

ChromaDB with cosine similarity

Retrieval

Streaming SSE

Response

ChromaDB

Storage

RAGAS-inspired metrics

Evaluation

Data / Request Flow

1

💬 User Query input

The user sends a query to the chatbot

2

🔍 Bi-Encoder Retrieval process

The bi-encoder retrieves relevant documents from the ChromaDB

3

⚙️ Cross-Encoder Re-Ranking process

The cross-encoder re-ranks the retrieved documents for precision

4

🧠 LLM Generation process

The large language model generates an answer based on the re-ranked context

5

✅ Response Output output

The chatbot returns the generated answer to the user

6

💾 Session Memory storage

The chatbot stores the conversation history in a sliding-window session memory

7

📊 Evaluation Metrics process

The chatbot calculates evaluation metrics like faithfulness, relevancy, precision, and recall

Tech Stack Layers

Tech Stack by Layer

Interface

Flask API HTML/CSS/JS

Orchestration

Bi-Encoder Cross-Encoder LLM

Storage & Cache

ChromaDB Session Memory

External APIs

OpenAI Ollama

🤖 Auto-generated from GitHub README · 2026-03-19 · Regenerate ↗

Project Architecture

📈 Finance AI Agent

The finance-ai-agent system is an autonomous AI investment committee that debates stocks using LangGraph and Groq's Llama 3.3, with human-in-the-loop approval and professional PDF report generation. It utilizes three live data sources: yfinance, SEC EDGAR, and Tavily. The system's architecture involves a planner agent, bull and bear analysts, a risk auditor, and a CIO judge, all working together to provide a comprehensive stock analysis.

GitHub ↗

LangGraph Multi-agent Groq·Llama 3.3 OpenAI GPT-4o yfinanceTavily Human-in-the-loop State machine Streamlit

Groq Llama 3.3

LLM

3 (yfinance, SEC EDGAR, Tavily)

Data Sources

PDF

Report Format

Human-in-the-Loop

Approval Mechanism

Stock Debate and Risk Audit

Analysis Type

Data / Request Flow

1

📈 Ticker Input input

The system starts by receiving a stock ticker input from the user.

2

🔍 Data Pipeline process

The system then retrieves data from yfinance, SEC EDGAR, and Tavily, and processes it for analysis.

3

🧠 Bull and Bear Analysis process

The system's bull and bear analysts analyze the data and provide their respective recommendations.

4

💬 Debate Round process

The bull and bear analysts engage in a debate, with each trying to convince the CIO judge of their recommendation.

5

🚨 Risk Audit process

The system's risk auditor evaluates the stock's risk and provides a score.

6

⚖️ CIO Judge Verdict decision

The CIO judge makes a final verdict based on the analysis and debate.

7

👥 Human-in-the-Loop Approval input

The system requires human approval before generating a professional PDF report.

8

📄 PDF Report Generation output

The system generates a professional PDF report based on the analysis and verdict.

Tech Stack Layers

Tech Stack by Layer

Interface

Streamlit ReportLab

Orchestration

LangGraph Groq

AI / Model

Llama 3.3 OpenAI

Data Sources

yfinance SEC EDGAR Tavily

🤖 Auto-generated from GitHub README · 2026-03-19 · Regenerate ↗

How It All Works

RAG Pipeline

Streaming SSE

Agent Orchestration

LangGraph · StateGraph Pattern

CrewAI · Sequential Multi-Agent Pattern

My Projects

🔗 Advanced RAG Chatbot

Data / Request Flow

Tech Stack Layers

📈 Finance AI Agent

Data / Request Flow

Tech Stack Layers