🔭

LLMWatch

LangChain LangGraph RAGAS LangSmith SQLite PostgreSQL Streamlit GitHub Actions Python

GitHub ↗

Overview

A plug-and-play LLM observability and evaluation platform for production agents. Drop-in callback handler wraps any LangChain or LangGraph agent in 2 lines — instantly tracks cost per query, latency percentiles (P50/P95/P99), token usage (input vs output split), and hallucination rate via RAGAS. Includes a live Streamlit dashboard with 6 KPI cards and 4 real-time charts, plus a GitHub Actions workflow that auto-runs RAGAS evaluation on every PR and posts scores as a PR comment — failing CI if quality drops below threshold. Supports multi-model pricing (OpenAI, Anthropic Claude, Groq, Gemini) with configurable budget and latency alerts.

Have questions about this project? Ask my AI assistant for details.

Ask AI about this →