Provider-agnostic RAG evaluation harness powered by RAGAS with pluggable LLM and embedding backends.
-
Updated
Jun 2, 2026 - Python
Provider-agnostic RAG evaluation harness powered by RAGAS with pluggable LLM and embedding backends.
Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.
Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.
enjoy harness ai, auto drive, Autonomous harness driving
Building AI developer tools in Rust 🦀 | MCP servers • Agentic CI/CD • Code Intelligence • Hodei Platform
Add a description, image, and links to the harness-ai topic page so that developers can more easily learn about it.
To associate your repository with the harness-ai topic, visit your repo's landing page and select "manage topics."