Skip to content

feat(evaluator): expose two-pass extraction in UI and docs#38

Open
DeryFerd wants to merge 1 commit into
anvie:mainfrom
DeryFerd:feat/evaluator-two-pass-exposure
Open

feat(evaluator): expose two-pass extraction in UI and docs#38
DeryFerd wants to merge 1 commit into
anvie:mainfrom
DeryFerd:feat/evaluator-two-pass-exposure

Conversation

@DeryFerd
Copy link
Copy Markdown
Contributor

Summary

Evonic already runs two-pass evaluation under the hood (Pass 1 answers the prompt, Pass 2 extracts a clean final value before scoring), but that behavior was mostly invisible: no in-app docs, no way to turn it off without editing .env, and no clear signal on the evaluators page that two_pass uses Pass 2.

This PR exposes that workflow for maintainers and benchmark operators.

Settings APIGET/PUT /api/settings/two-pass-enabled stores the preference in the app settings table. AnswerExtractor.is_enabled() checks the DB first, then falls back to TWO_PASS_ENABLED from the environment, so toggling in the UI applies on the next evaluation run without a server restart.

Evaluators UI (/evaluate/evaluators) — A panel at the top explains what Pass 2 does and includes an enable/disable toggle. Built-in evaluators that use Pass 2 (e.g. two_pass) show a small Pass 2 badge in the list.

Documentation — Adds docs/two-pass-evaluation.md (flow, config, result fields, how to disable) and serves it in-app at /evaluate/docs/two-pass with a link from the evaluators page.

No change to extraction prompts or scoring logic; this is documentation and operator controls only.

Validation

  • python -m pytest tests/test_answer_extractor.py::TestTwoPassEnabled tests/test_evaluators.py::TestTwoPassEvaluator -q — 4 passed
  • Manual: open /evaluate/evaluators, toggle two-pass off/on, confirm PUT /api/settings/two-pass-enabled succeeds
  • Manual: open /evaluate/docs/two-pass and confirm the guide renders
  • Manual (optional): run a math/reasoning evaluation and confirm pass2 metadata still appears in results when enabled

Co-authored-by: Cursor <cursoragent@cursor.com>
jankric pushed a commit to jankric/evonic that referenced this pull request May 16, 2026
…prevent reassignment bug

The function declaration at original line 1354 could reassign showTab at
runtime, overwriting the first override at line 1216. Moving the declaration
to before both overrides ensures the wrapping chain works correctly:
  showTab -> wrapper2 -> wrapper1 -> original
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant