Replace Step 5 CI sampling claim with canonical + fix label conflation#299
Open
PunchTheDev wants to merge 1 commit into
Open
Replace Step 5 CI sampling claim with canonical + fix label conflation#299PunchTheDev wants to merge 1 commit into
PunchTheDev wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 5 of the Quickstart Guide had a factually-wrong CI sampling claim and conflated two distinct PR labels. This PR aligns the section with the canonical sources (
eval.yml,score.yml) and surfaces both labels accurately.Motivation
Two correctness issues found by canonical-source check:
CI sampling: copy said
CI runs a quick check (1 easy problem per category). Per.github/workflows/eval.ymlL150 +scripts/run_eval_pool.py, CI actually runs 3 specs — one random per round (any tier), 2× determinism on the first. Same shape as the anti-gaming sampling already explained in the Anti-gaming section.PR label conflation: copy said the
optimizationlabel is appliedif your agent passes all three categories. Pereval.ymlL338–354, that's thepassedlabel's behavior. Theoptimizationlabel is actually applied when the PRbeats SOTA in at least one category. The two were collapsed into one — and thepassedlabel was missing entirely."Full scoring runs automatically" timing: ambiguous about whether per-PR or post-merge. Per
.github/workflows/score.ymlL3, the all-45-spec eval runs only after PR merges to main.Changes
QuickstartGuide.tsxL721–757: lead rewritten around canonical CI sampling shape; routed#anti-gaming ↓link added; 199-charCItooltip + 216-char3-spec pool sampletooltip.<ul>replaces the single misleading sentence:passed(194-char tooltip) +optimization(186-char tooltip) — each accurate pereval.ymlL338–339.score.ymlpost-merge framing (221-char tooltip) with routed/rankings → overall_scorelink.BACKLOG.md: bundledStep 1 → Step 5row split into 5 individual rows (matching prior PRs Frame Guide Step 2 with CLI vs API roles and filter hints #295/Frame Guide Step 1 by canonical Docker vs Native paths #296/Surface fork CTA and sandbox limits in Guide Step 3 #297/Replace Step 4 generic copy with canonical 4-stage pipeline #298 split shape); Step 5 flipped○ ○ ○ → ● ● ●with rationale.Screenshots
Puppeteer-verified at 1440×900 on
/guide#submit: all 5 dotted-underline tooltips render (199/216/194/186/221 chars), 3<code>chips (passed,optimization,score.yml), routed#anti-gaming+/rankingsanchors present, old1 easy problem per categoryandpasses all three categoriesphrases gone. 0 new console errors.