Skip to content
#

harness-ai

Here are 5 public repositories matching this topic...

Language: All
Filter by language

Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.

  • Updated Jun 3, 2026
  • Python

Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.

  • Updated Jun 3, 2026
  • Python

Improve this page

Add a description, image, and links to the harness-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the harness-ai topic, visit your repo's landing page and select "manage topics."

Learn more