Verify multi-spec pool eval across all 3 categories by PunchTheDev · Pull Request #165 · PunchTheDev/forge

PunchTheDev · 2026-06-02T23:23:23Z

Summary

Pipeline verification PR — not a competition submission.

The multi-spec pool eval (PR #164) runs one easy spec from each of the 3 active rounds simultaneously: mass optimization (round_001), stiffness-to-weight (round_002), and deflection (round_003). Neither round_002 nor round_003 have ever been tested end-to-end in CI.

This PR uses a simple L-bracket agent (not competitive) to verify that:

select_eval_specs.py correctly selects one spec from each round
FEA runs and produces displacement output for stiffness/deflection scoring
record_submissions.py correctly records all 3 metric types to the API
The CI comment shows a cross-category table with all 3 rows

After CI passes

Close this PR without merging — it is for verification only.

Also bump pids-limit 256→512 — OCP shape ops can spawn many threads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-02T23:39:48Z

Forge Eval — PASSED ✅

Cross-Category Results

Status	Category	Score	Baseline	vs Baseline	Current SOTA
✅ Mass Optimization ↓	`r01_001_easy`	276.87 g	263.20 g	--5.2%	—
✅ Stiffness/Weight ↑	`r02_001_easy`	3.4197 N/(mm·g)	258.9620 N/(mm·g)	+-98.7%	—
✅ Absolute Stiffness ↓	`r03_001_easy`	0.1291 mm	0.0022 mm	--5818.8%	—

Composite score: 4532.3% of baseline across all 3 categories (lower = better; maximize metrics are inverted for uniform comparison)

Passes all three categories but does not beat the current SOTA. Keep optimizing.

PunchTheDev · 2026-06-02T23:41:46Z

Pipeline verified ✓ All 3 categories pass end-to-end: mass (r01), stiffness-to-weight (r02), deflection (r03). Closing without merging — this was a verification-only agent. The step-out write fix and diagnostic improvements were merged to main via f55e968.

Punch and others added 5 commits June 2, 2026 23:23

Add eval-verify agent to test multi-spec pool CI across all 3 categories

b3b8a90

Fix eval-verify agent to fit within any spec build volume

fd8510c

Center arm at load point Z to ensure FEA load node coverage

e05dcc4

Improve pool eval error reporting: print Docker stderr on empty output

7d47846

Also bump pids-limit 256→512 — OCP shape ops can spawn many threads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into punch/verify-pool-eval-all-categories

e8e7a3d

github-actions Bot added the passed PR passes all three eval categories label Jun 2, 2026

PunchTheDev closed this Jun 2, 2026

PunchTheDev deleted the punch/verify-pool-eval-all-categories branch June 3, 2026 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify multi-spec pool eval across all 3 categories#165

Verify multi-spec pool eval across all 3 categories#165
PunchTheDev wants to merge 5 commits into
mainfrom
punch/verify-pool-eval-all-categories

PunchTheDev commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

PunchTheDev commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PunchTheDev commented Jun 2, 2026

Summary

After CI passes

Uh oh!

github-actions Bot commented Jun 2, 2026

Forge Eval — PASSED ✅

Cross-Category Results

Uh oh!

PunchTheDev commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant