You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Your function receives case data via `ForEach`/`From` (same as parameterized tests)
42
42
2. It returns the output (string, object, anything)
43
43
3. ProTest passes the output to evaluators → scores
44
-
4.Scores determine pass/fail via thresholds
44
+
4.Bool verdicts determine pass/fail
45
45
5. Aggregated stats appear in the terminal
46
46
47
47
The rest of the pipeline — fixtures, DI, parallelism, reporters — works identically to tests.
@@ -87,15 +87,23 @@ An evaluator is a function decorated with `@evaluator` that receives an `EvalCon
87
87
88
88
### Return Types
89
89
90
-
Evaluators return `bool` (simple verdict) or a `dataclass` (structured result). The framework reads fields by type:
90
+
Evaluators return `bool` (simple verdict) or a `dataclass` (structured result). In dataclasses, annotate fields to tell the framework what each one is:
@@ -334,7 +344,7 @@ protest history --evals --compare
334
344
Each case in history carries two hashes:
335
345
336
346
-**`case_hash`** — hash of inputs + expected output. Changes when the test data changes.
337
-
-**`eval_hash`** — hash of evaluators + thresholds. Changes when the scoring criteria change.
347
+
-**`eval_hash`** — hash of evaluators. Changes when the scoring criteria change.
338
348
339
349
`protest history --compare` uses these hashes to detect modified cases vs regressions. If a case's `eval_hash` changed between runs, it's reported as "scoring modified" rather than a real regression.
0 commit comments