(feat) Run evals N number of times

Currently evals work like this: 
1. create new eval
2. ONLY WHEN the prompt version changes, are you able to re-run
3. evaluate outcome

Generally i want to keep this pattern, however I want to specify how many "rounds" to run for the eval. Let's say you want to run each eval 5 times to really stress test things.