harness-ai

Star

Here are 5 public repositories matching this topic...

sunilp303 / ragas-evaluation-harness

Star

Provider-agnostic RAG evaluation harness powered by RAGAS with pluggable LLM and embedding backends.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation ragas harness-ai

Updated Jun 2, 2026
Python

sunilp303 / deepeval-evaluation-harness

Star

Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation ragas deepeval harness-ai deepeval-metrics

Updated Jun 3, 2026
Python

sunilp303 / trulens-agent-starter

Star

Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.