Currently evals work like this:
- create new eval
- ONLY WHEN the prompt version changes, are you able to re-run
- evaluate outcome
Generally i want to keep this pattern, however I want to specify how many "rounds" to run for the eval. Let's say you want to run each eval 5 times to really stress test things.
Currently evals work like this:
Generally i want to keep this pattern, however I want to specify how many "rounds" to run for the eval. Let's say you want to run each eval 5 times to really stress test things.