-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarking strategy #12
Copy link
Copy link
Open
Description
lukaskellerstein
opened on Mar 7, 2025
Issue body actions
- Define configurations for agents
- Choose scenarios that need to be tested
- Translate scenarios as Instructions for Agents
- Define OS state
- Define App state
- Define Evaluation condition = Definition of success
- Record (Teams) telemetry for scenarios
- create a distilled version of what's happened
- Define strategy for benchmarking
- How many times each config will run?
- How we will measure / calculate success?
- Benchmarks
- https://github.com/ltzheng/agent-studio/tree/main
- https://microsoft.github.io/autogen/0.2/blog/2024/01/25/AutoGenBench
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.