A lightweight, general-purpose harness for tool-using LLM agents, fair benchmark evaluation, harness baselines, and personal assistant workflows.
agent science benchmarking evaluation assistant gemini baseline glm gpt harness llm qwen tool-calling openai-compatible agent-harness researchclawbench
-
Updated
Jun 29, 2026 - Python