Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data #3

telleroutlook · 2026-06-26T01:48:37Z

telleroutlook
Jun 26, 2026
Maintainer

Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data

Third post in the WasmAgent series. This one is about what happens after the agent run — how the execution record becomes structured training data.

The pipeline

Agent run → AEPRecord → ComplianceVerifier → RepairPlanner → ComplianceEvalRecord → training

ComplianceEvalRecord is a typed, schema-versioned object — the canonical data contract between WasmAgent's runtime and the downstream training pipeline. It carries the full repair trace, verifier verdicts, and a reference back to the AEPRecord.

Three verifiers, one verdict

IFEvalVerifier — 15 instruction-following classes (word count, format, language, sections, etc.), all deterministic
DeterministicVerifier — 7 structural checks
LLMJudgeVerifier — adversarial binary judgment via a judge model

All three run per attempt. The final ComplianceEvalRecord includes each verifier's verdict independently.

Empirical result

On IFEval × Qwen2.5-1.5B-Q4, full_pcl mode achieves 54.7% ±1.2pp vs prompt_retry 46.0% ±2.0pp (+8.7pp, 3 seeds × 50 samples).

Reproducible: bun packages/compliance/benchmarks/ifeval/run.ts --limit=50 --seed=42

Repo

packages/compliance (@wasmagent/compliance, alpha)
packages/core/src/enhancement — RolloutForkRunner
packages/core/src/ranking — RolloutRanker

Questions:

What reward signals are you using today for RLAIF-style training data generation?
Is the compliance verifier approach (deterministic + LLM judge) sufficient, or do you need domain-specific verifiers?
Thoughts on the full_pcl vs prompt_retry tradeoff — when would you prefer the simpler mode?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WasmAgent

Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data #3

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

WasmAgent

Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data #3

Uh oh!

telleroutlook Jun 26, 2026 Maintainer

Show and tell: Trace-to-Training — compliance-evaluated agent runs as training data

The pipeline

Three verifiers, one verdict

Empirical result

Repo

Replies: 0 comments

telleroutlook
Jun 26, 2026
Maintainer