Skip to content

Replace Step 5 CI sampling claim with canonical + fix label conflation#299

Open
PunchTheDev wants to merge 1 commit into
mainfrom
punch/guide-step5-submit
Open

Replace Step 5 CI sampling claim with canonical + fix label conflation#299
PunchTheDev wants to merge 1 commit into
mainfrom
punch/guide-step5-submit

Conversation

@PunchTheDev
Copy link
Copy Markdown
Owner

Summary

Step 5 of the Quickstart Guide had a factually-wrong CI sampling claim and conflated two distinct PR labels. This PR aligns the section with the canonical sources (eval.yml, score.yml) and surfaces both labels accurately.

Motivation

Two correctness issues found by canonical-source check:

  1. CI sampling: copy said CI runs a quick check (1 easy problem per category). Per .github/workflows/eval.yml L150 + scripts/run_eval_pool.py, CI actually runs 3 specs — one random per round (any tier), 2× determinism on the first. Same shape as the anti-gaming sampling already explained in the Anti-gaming section.

  2. PR label conflation: copy said the optimization label is applied if your agent passes all three categories. Per eval.yml L338–354, that's the passed label's behavior. The optimization label is actually applied when the PR beats SOTA in at least one category. The two were collapsed into one — and the passed label was missing entirely.

  3. "Full scoring runs automatically" timing: ambiguous about whether per-PR or post-merge. Per .github/workflows/score.yml L3, the all-45-spec eval runs only after PR merges to main.

Changes

Screenshots

Puppeteer-verified at 1440×900 on /guide#submit: all 5 dotted-underline tooltips render (199/216/194/186/221 chars), 3 <code> chips (passed, optimization, score.yml), routed #anti-gaming + /rankings anchors present, old 1 easy problem per category and passes all three categories phrases gone. 0 new console errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant