Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ If any seat would be confused, the component fails.
- "Agent architecture patterns" — ○ ○ ○
- "API reference" — ○ ○ ○
- "How rewards work" — ○ ○ ○
- "Anti-gaming guarantees" — ○ ○ ○
- "Anti-gaming guarantees" — ● ● ● — step 368: each of 7 bullets gets cursor-help + tooltip citing canonical source (run_eval_pool, similarity script, sandbox, FORGE_MODEL_WHITELIST, fea, generator). Bullet 2 fixed from false "all 45 problems" claim to accurate "3 of 45, one random per round". Marginal-gain callout untouched (already good).

### Cross-cutting

Expand Down
4 changes: 2 additions & 2 deletions dist/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@
};
}
</script>
<script type="module" crossorigin src="/assets/index-brl2HXQT.js"></script>
<script type="module" crossorigin src="/assets/index-DD-pJW96.js"></script>
<link rel="modulepreload" crossorigin href="/assets/react-vendor-W1izUqcL.js">
<link rel="stylesheet" crossorigin href="/assets/index-DKTPzT-J.css">
<link rel="stylesheet" crossorigin href="/assets/index-D1Eh-9SG.css">
</head>
<body>
<noscript>
Expand Down
41 changes: 31 additions & 10 deletions src/components/QuickstartGuide.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -837,17 +837,38 @@ git push mine your-name/my-design

<ul className="text-forge-muted text-sm space-y-1.5">
{[
"Determinism check — first problem runs twice, scores must match exactly",
"Full coverage — final scoring runs all 45 problems, no sampling variance",
"Duplicate detection — same commit hash is never scored twice",
"Similarity check — agents must not copy existing agents' code",
"LLM calls whitelisted — model fixed by harness, agents cannot self-select models",
"60s / 4GB limits — prevents brute-force search",
"Seeds fixed — geometry and mesh generation are deterministic across runs",
].map((item) => (
<li key={item} className="flex items-start gap-2">
{
label: "Determinism check — first problem runs twice, scores must match exactly",
tip: "Stops stochastic agents that hit the right answer by luck — if rerunning the same spec yields a different score, the submission is rejected. See scripts/run_eval_pool.py L47–49.",
},
{
label: "Three-round sampling — each PR scores 3 of 45 problems, one random spec per round",
tip: "Sampling 1 spec from each of the 3 active rounds keeps CI under ~20 min and forces breadth (all 3 categories factor in). Per-PR variance is the explicit tradeoff — consistent winners must beat noise across many submissions.",
},
{
label: "Duplicate detection — same commit hash is never scored twice",
tip: "Blocks the cheap attack of re-submitting an identical commit until the random sample lands on specs you happen to be strong on.",
},
{
label: "Similarity check — agents must not copy existing agents' code",
tip: "Source-similarity scan in scripts/check_source_similarity.py rejects near-clones at PR review — you can fork the SOTA agent, but you must actually change it.",
},
{
label: "LLM calls whitelisted — model fixed by harness, agents cannot self-select models",
tip: "forge/sdk/llm.py enforces FORGE_MODEL_WHITELIST: any model not on the curated list is rejected. Prevents a money-arms-race where the richest miner buys the biggest model.",
},
{
label: "60s / 4GB limits — prevents brute-force search",
tip: "benchmark/sandbox.py caps wall-clock and RAM per run. An agent can't grid-search thousands of geometries inside the eval window — it has to be smart, not loud.",
},
{
label: "Seeds fixed — geometry and mesh generation are deterministic across runs",
tip: "specs/generator.py derives per-spec seeds from the master seed, and benchmark/fea.py seeds the FEA mesh by spec id. Same spec → identical geometry and identical FEA across machines and reruns.",
},
].map(({ label, tip }) => (
<li key={label} className="flex items-start gap-2">
<span className="text-forge-green mt-0.5">+</span>
<span>{item}</span>
<span className="cursor-help border-b border-dotted border-forge-muted/40" title={tip}>{label}</span>
</li>
))}
</ul>
Expand Down