Skip to content

Replace Step 4 generic copy with canonical 4-stage pipeline#298

Open
PunchTheDev wants to merge 1 commit into
mainfrom
punch/guide-step4-stage-pipeline
Open

Replace Step 4 generic copy with canonical 4-stage pipeline#298
PunchTheDev wants to merge 1 commit into
mainfrom
punch/guide-step4-stage-pipeline

Conversation

@PunchTheDev
Copy link
Copy Markdown
Owner

Summary

Step 4 of the Guide (/guide#eval) was generic: "Test your agent before submitting" + a code block + a 4-item passing list that did not match the actual benchmark/evaluate.py pipeline. This rewrites the section around the canonical 4 stages (agent → geometry → fea → similarity) and frames why a local run matters.

Motivation

  • The "passing result means" list lumped 3 geometry checks into items 1-3 and missed Stage 1 (agent) and Stage 4 (similarity) entirely.
  • "bolt hole clearance" was factually wrong — _check_bolt_holes in benchmark/geometry.py ray-shoots to verify hole presence at the bolt_pattern_mm centers, not clearance.
  • Lead never named why you'd run locally before submitting (CI per-PR runs only 3 of 45 problems; iterate offline first).

Changes

  • Lead rewritten: names the 3-of-45 CI sampling (routed to #anti-gaming), forge eval --docker parity with the forge-eval:latest GHCR image (209-char tooltip).
  • Bottom paragraph replaced with a 4-stage <ol> from benchmark/evaluate.py:
    • agent (221-char tip — sandbox + STEP file return)
    • geometry (373-char tip — 4 sequential checks, each named with the canonical spec field)
    • fea (323-char tip — von Mises vs yield_stress_mpa / safety_factor)
    • similarity (261-char tip — check_source_similarity.py; routed to #anti-gaming)
  • forge status command line gains --spec flag for clarity.
  • BACKLOG: split the bundled "Step 1 → Step 5" line into 5 individual rows; flipped Step 4 to ● ● ●.

Verified live at localhost:8080/guide#eval: 4 stages render, all tooltips populated, both #anti-gaming links navigate, no new console errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant