Hi team! First of all, thank you so much for EcoClaw — the idea of routing to the cheapest capable model based on real benchmark data is brilliant, and the cost savings are very impressive. Your research work on LLM routing (Avengers, AvengersPro, LLMRouterBench, etc.) is also incredibly valuable to the community. Great job! 🙌
I noticed that EcoClaw currently uses PinchBench data for model selection and scoring. I'm curious whether there are any plans to support Claw-Eval as an additional benchmark source?
Claw-Eval is an end-to-end benchmark specifically designed for AI agents acting as personal assistants, with 104 tasks, 15 mock enterprise services, Docker sandboxes, and deterministic grading. It could be a great complement to PinchBench for evaluating model capabilities in agentic scenarios.
Specifically, I'm wondering:
- Would it be feasible to incorporate Claw-Eval scores into EcoClaw's routing decisions (e.g., as an additional signal alongside PinchBench)?
- Or has there been any testing of EcoClaw's routing accuracy against the Claw-Eval benchmark?
Thanks again for your amazing work, and looking forward to hearing your thoughts!
Hi team! First of all, thank you so much for EcoClaw — the idea of routing to the cheapest capable model based on real benchmark data is brilliant, and the cost savings are very impressive. Your research work on LLM routing (Avengers, AvengersPro, LLMRouterBench, etc.) is also incredibly valuable to the community. Great job! 🙌
I noticed that EcoClaw currently uses PinchBench data for model selection and scoring. I'm curious whether there are any plans to support Claw-Eval as an additional benchmark source?
Claw-Eval is an end-to-end benchmark specifically designed for AI agents acting as personal assistants, with 104 tasks, 15 mock enterprise services, Docker sandboxes, and deterministic grading. It could be a great complement to PinchBench for evaluating model capabilities in agentic scenarios.
Specifically, I'm wondering:
Thanks again for your amazing work, and looking forward to hearing your thoughts!