Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data #3
telleroutlook
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data
Third post in the WasmAgent series. This one is about what happens after the agent run — how the execution record becomes structured training data.
The pipeline
ComplianceEvalRecordis a typed, schema-versioned object — the canonical data contract between WasmAgent's runtime and the downstream training pipeline. It carries the full repair trace, verifier verdicts, and a reference back to theAEPRecord.Three verifiers, one verdict
All three run per attempt. The final
ComplianceEvalRecordincludes each verifier's verdict independently.Empirical result
On IFEval × Qwen2.5-1.5B-Q4,
full_pclmode achieves 54.7% ±1.2pp vsprompt_retry46.0% ±2.0pp (+8.7pp, 3 seeds × 50 samples).Reproducible:
bun packages/compliance/benchmarks/ifeval/run.ts --limit=50 --seed=42Repo
packages/compliance(@wasmagent/compliance, alpha)packages/core/src/enhancement—RolloutForkRunnerpackages/core/src/ranking—RolloutRankerQuestions:
full_pclvsprompt_retrytradeoff — when would you prefer the simpler mode?Beta Was this translation helpful? Give feedback.
All reactions