A plug-and-play LLM observability and evaluation platform for production agents. Drop-in callback handler wraps any LangChain or LangGraph agent in 2 lines โ instantly tracks cost per query, latency percentiles (P50/P95/P99), token usage (input vs output split), and hallucination rate via RAGAS. Includes a live Streamlit dashboard with 6 KPI cards and 4 real-time charts, plus a GitHub Actions workflow that auto-runs RAGAS evaluation on every PR and posts scores as a PR comment โ failing CI if quality drops below threshold. Supports multi-model pricing (OpenAI, Anthropic Claude, Groq, Gemini) with configurable budget and latency alerts.
Have questions about this project? Ask my AI assistant for details.
Ask AI about this โ