Agentic diagnostic assistant for distributed-system incidents: multi-turn RAG, hypothesis updates, evidence packing, golden evals, and failure-attributed run reports.
-
Updated
May 22, 2026 - Rust
Agentic diagnostic assistant for distributed-system incidents: multi-turn RAG, hypothesis updates, evidence packing, golden evals, and failure-attributed run reports.
RAG evaluation harness with RAGAS, TruLens, LLM-as-Judge, golden sets, Recall@k, nDCG, and faithfulness.
Portfolio-grade AI quality evaluation lab with golden datasets, prompt regression, groundedness checks, hallucination tests and CI thresholds.
Add a description, image, and links to the golden-dataset topic page so that developers can more easily learn about it.
To associate your repository with the golden-dataset topic, visit your repo's landing page and select "manage topics."