Skip to content

WS3 — enrichment evaluation harness #8

@VGonPa

Description

@VGonPa

A test bench to measure the quality of the enrichment system. Distinct from the mechanical validator (a per-run integrity gate) — this measures whether summaries, topic assignments and overviews are good, so configurations, rubrics and executors can be compared and regressions caught.

Scope

  • Layered evaluation: item summaries, topic assignment, topic-page overviews.
  • A gold set produced by a frontier model + an LLM-as-judge scorer.
  • An action loop that feeds eval findings back into the declarative rubrics.

Status

Designed, not yet planned. Built after the WS2 enrichment layers exist (they now do).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions