Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,20 @@

## Spec

<!-- Which spec does this target? (e.g., 001_bracket) -->
<!-- Which spec does this target? (e.g., pub_001_medium) -->

## Score

<!-- CI will post your score automatically. Paste it here once CI runs. -->

## Approach

<!-- Brief technical description: topology optimization? parametric? lattice? -->
<!-- Brief technical description: topology optimization? parametric? LLM-guided? lattice? -->

## Checklist

- [ ] `agents/<my-name>/agent.py` implements `generate(spec) -> bytes`
- [ ] Local eval passes: `docker run ... --agent agents/<my-name>/agent.py --spec specs/001_bracket.json`
- [ ] No external network calls in `generate()`
- [ ] Agent is deterministic (same output for same spec)
- [ ] `agents/<my-name>/agent.py` implements `generate(spec) -> bytes` or `generate(spec, llm) -> bytes`
- [ ] `agents/<my-name>/spec.txt` contains the target spec ID (e.g., `pub_001_medium`)
- [ ] Local eval passes: `forge eval agents/<my-name>/agent.py`
- [ ] Agent is deterministic (same spec → same bytes; fix any random seeds)
- [ ] LLM agents: using an injected `LLMClient`, not a hardcoded API key
6 changes: 4 additions & 2 deletions .github/workflows/eval.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
id: agent
run: |
AGENT=$(git diff --name-only origin/${{ github.base_ref }}...HEAD \
| grep '^agents/.*/agent\.py$' | head -1)
| grep '^agents/.*/agent\.py$' | grep -v '^agents/template/' | head -1)
if [ -z "$AGENT" ]; then
echo "No agent.py changed — skipping eval."
echo "found=false" >> "$GITHUB_OUTPUT"
Expand Down Expand Up @@ -102,10 +102,12 @@ jobs:
STEP_FLAG="--step-out /forge/.forge_step_output.step"
fi
OUT=$(docker run --rm \
--network none \
--security-opt no-new-privileges \
--memory 4g \
--cpus 2 \
-e FORGE_LLM_KEY=${{ secrets.FORGE_LLM_KEY || secrets.OPENROUTER_KEY }} \
-e FORGE_MODEL=${{ secrets.FORGE_MODEL || 'anthropic/claude-haiku-4-5' }} \
-e FORGE_MODEL_WHITELIST=${{ vars.FORGE_MODEL_WHITELIST || 'anthropic/claude-haiku-4-5,anthropic/claude-3-5-haiku,openai/gpt-4o-mini' }} \
-v "${{ github.workspace }}:/forge" \
forge-eval \
--agent /forge/${{ steps.agent.outputs.path }} \
Expand Down
66 changes: 63 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,79 @@ mkdir agents/<your-name>
touch agents/<your-name>/agent.py
```

Implement the `generate` function:
Implement the `generate` function. There are two supported signatures:

**Static agent** (no LLM — backward compatible):
```python
def generate(spec: dict) -> bytes:
"""Return STEP file bytes for a part that satisfies spec."""
...
```

**LLM agent** (recommended):
```python
from forge.sdk.llm import LLMClient

def generate(spec: dict, llm: LLMClient) -> bytes:
"""Return STEP file bytes, using the LLM to reason about geometry."""
...
```

The harness detects which signature you use via `inspect.signature` and injects
an `LLMClient` automatically — you do not need to provide an API key.

#### Using the LLM client

`LLMClient` wraps the OpenRouter API:

```python
response: str = llm.chat(
messages=[{"role": "user", "content": "Your prompt here"}],
max_tokens=512,
)
```

The model is chosen by the harness via `FORGE_MODEL`. During CI, only
whitelisted models are accepted:

- `anthropic/claude-haiku-4-5`
- `anthropic/claude-3-5-haiku`
- `openai/gpt-4o-mini`

Miners do not configure the API key or model — the harness injects both.

#### Observe → Plan → Act pattern

```python
from forge.sdk.llm import LLMClient
import json

def generate(spec: dict, llm: LLMClient) -> bytes:
# Observe: extract constraints
c = spec["constraints"]

# Plan: ask the LLM to reason about geometry parameters
raw = llm.chat([{
"role": "user",
"content": f"Given build volume {c['build_volume_mm']}, propose arm_length and wall_thickness as JSON."
}])
dims = json.loads(raw)

# Act: build the geometry with build123d
from build123d import Box, BuildPart
with BuildPart() as part:
Box(dims["arm_length"], dims["wall_thickness"], dims["wall_thickness"])

# ... export to STEP and return bytes
```

See `examples/llm-agent/agent.py` for a complete working example.

The agent runs inside a Docker container with these constraints:
- **Time:** 60 seconds
- **Memory:** 4 GB
- **Network:** disabled
- **Libraries available:** `build123d`, `gmsh`, `numpy`, `scipy`, `OCP`
- **Network:** enabled (required for LLM API calls)
- **Libraries available:** `build123d`, `gmsh`, `numpy`, `scipy`, `OCP`, `httpx`

### 3. Test locally

Expand Down
28 changes: 19 additions & 9 deletions QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,21 +75,31 @@ curl http://143.244.191.193:8000/specs/001_bracket
cp -r agents/template agents/<your-name>
```

Edit `agents/<your-name>/agent.py`. The only contract:
Edit `agents/<your-name>/agent.py`. Two supported signatures:

**Static agent** (no LLM):
```python
def generate(spec: dict) -> bytes:
"""
Takes the spec dict (load, bolt pattern, build volume, material).
Returns STEP file bytes for your design.
"""
"""Takes the spec dict, returns STEP file bytes."""
...
```

See `agents/taper-beam/agent.py` for a clean I-beam reference implementation (~38g).
See `agents/lean-arm/agent.py` for the I-beam baseline (~32g).
See `agents/pocket-plate/agent.py` for the wall-pocketing approach (~30g).
See `agents/compact-arm/agent.py` for the current SOTA (~27g).
**LLM agent** (recommended — harness injects the client):
```python
from forge.sdk.llm import LLMClient

def generate(spec: dict, llm: LLMClient) -> bytes:
"""Use the LLM to reason about geometry, then return STEP bytes."""
response = llm.chat([{"role": "user", "content": "..."}])
...
```

No API key needed — the harness injects `LLMClient` automatically using whitelisted models. See `examples/llm-agent/agent.py` for a complete working example.

Reference implementations in `agents/`:
- `taper-beam/` — clean I-beam (~38g)
- `lean-arm/` — I-beam baseline (~32g)
- `compact-arm/` — pocketed arm approach

---

Expand Down
37 changes: 25 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ forge eval agents/<your-name>/agent.py
## Submitting

1. Fork this repo.
2. Create `agents/<your-name>/agent.py` with a `generate(spec: dict) -> bytes` function.
2. Create `agents/<your-name>/agent.py` with a `generate(spec, [llm]) -> bytes` function.
3. Open a PR. CI scores your design automatically (~2 min) and posts:
```
## Forge Eval — NEW LEADER 🏆
Expand All @@ -77,18 +77,30 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines.

## Agent interface

Two supported signatures — the harness detects which one you use automatically:

**Static agent** (no LLM):
```python
def generate(spec: dict) -> bytes:
"""
Takes the spec dict (load, bolt pattern, build volume, material).
Returns STEP file bytes for your design.
"""Build and return STEP file bytes for the given spec."""
...
```

Sandbox: 60s timeout, 4GB RAM, no network access.
"""
**LLM agent** (recommended):
```python
from forge.sdk.llm import LLMClient

def generate(spec: dict, llm: LLMClient) -> bytes:
"""Use the LLM to reason about geometry, then return STEP bytes."""
response = llm.chat([{"role": "user", "content": "..."}])
...
```

Libraries available in eval: `build123d`, `OCP`, `gmsh`, `numpy`, `scipy`. See agents/ for reference implementations.
The harness injects `LLMClient` automatically — no API key required. Whitelisted models: `claude-haiku-4-5`, `claude-3-5-haiku`, `gpt-4o-mini`. See `examples/llm-agent/` for a complete working example.

Sandbox constraints: **60s timeout · 4 GB RAM · network enabled (LLM calls only)**

Libraries available: `build123d`, `OCP`, `gmsh`, `numpy`, `scipy`, `httpx`. See `agents/` for reference implementations.

---

Expand Down Expand Up @@ -135,11 +147,12 @@ All CPU. No GPU required.

Live: http://143.244.191.193:8000/sota

| Spec | Score | Agent | FEA Stress |
|---|---|---|---|
| 001 Wall Bracket | **27.22 g** | compact-arm | 13.8 / 25.0 MPa |
| 002 Equipment Mount | — | — | — |
| 003 Pipe-Clamp | 2799.52 g | baseline_steel | 22.18 / 82.0 MPa |
| Spec | Score | Agent |
|---|---|---|
| spec-001 Wall Bracket | **23.48 g** | sub-nano |
| spec-002 Equipment Mount | **25.84 g** | al-bracket-v19 |
| spec-003 Pipe-Clamp | **71.42 g** | ss-bracket-v15 |
| pub_001 – pub_005 | see leaderboard | various |

---

Expand Down
Empty file added agents/.keep
Empty file.
46 changes: 26 additions & 20 deletions agents/template/agent.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
"""
Template agent — start here.

Contract: implement generate(spec) -> bytes (STEP file).
Two supported signatures:

The eval harness calls generate() with the spec dict for each problem.
Return valid STEP bytes. The harness handles geometry checks and FEA;
your job is to return the lightest part that passes all constraints.
generate(spec: dict) -> bytes # static agent
generate(spec: dict, llm: LLMClient) -> bytes # LLM agent (recommended)

See QUICKSTART.md for a full walkthrough.
The harness detects which you use via inspect.signature and injects LLMClient
automatically if present — no API key required from you.

See QUICKSTART.md for a full walkthrough and examples/llm-agent/ for an
LLM agent example.
"""

from __future__ import annotations

# To use the LLM client, uncomment:
# from forge.sdk.llm import LLMClient

# TODO: import your geometry library
# from build123d import ... # recommended
# from OCP.BRepPrimAPI import ... # raw OCP (see agents/baseline/)
Expand All @@ -21,27 +27,27 @@ def generate(spec: dict) -> bytes:
"""
Build and return a STEP file for the given spec.

To use an LLM, change the signature to: generate(spec, llm: LLMClient)

Args:
spec: Problem specification dict. Key structure:
spec["constraints"]["load_n"] — applied load in Newtons
spec["constraints"]["load_point_mm"] — [x, y, z] load application point
spec["constraints"]["build_volume_mm"] — [x, y, z] max bounding box
spec["constraints"]["bolt_pattern_mm"] — [[y, z], ...] bolt hole centers (x=0 plane)
spec["constraints"]["bolt_diameter_clearance_mm"] — minimum clearance diameter
spec["constraints"]["min_wall_thickness_mm"] — minimum feature wall
spec["constraints"]["max_overhang_deg"] — max overhang from vertical
spec["material"] — material name (see benchmark/materials.py)
spec: Problem specification dict. Key fields:
spec["constraints"]["load_n"] — load in Newtons
spec["constraints"]["load_point_mm"] — [x, y, z] load point
spec["constraints"]["build_volume_mm"] — [x, y, z] bounding box
spec["constraints"]["bolt_pattern_mm"] — [[y, z], ...] bolt centers
spec["constraints"]["bolt_diameter_clearance_mm"] — hole clearance
spec["constraints"]["min_wall_thickness_mm"] — minimum wall
spec["constraints"]["max_overhang_deg"] — max printable overhang
spec["material"] — material name
spec["safety_factor"] — FEA stress safety factor
spec["scoring"]["metric"] — "mass_grams" | "volume_mm3" | ...

Returns:
STEP file as raw bytes. Must be valid AP214IS STEP.
STEP file as raw bytes (AP214IS schema required).

Notes:
- Must be deterministic: same spec → same bytes every call.
If you use any randomness, fix the seed (e.g. random.seed(42)).
- The FEA mesh uses C3D4 linear tets at ~2 mm characteristic length.
Avoid features thinner than 3 mm — they produce degenerate elements.
- Lower mass = better score. There is no ceiling; keep optimizing.
- Must be deterministic: same spec → same bytes. Fix any random seeds.
- Avoid features thinner than 3 mm — they produce degenerate FEA elements.
"""

constraints = spec["constraints"]
Expand Down
13 changes: 12 additions & 1 deletion benchmark/_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,16 @@

import argparse
import importlib.util
import inspect
import json
import os
import resource
import sys
from pathlib import Path

# Make forge.sdk importable regardless of install state.
sys.path.insert(0, str(Path(__file__).parent.parent))

CPU_SECONDS = 150


Expand Down Expand Up @@ -53,7 +57,14 @@ def main() -> None:

try:
loader_spec.loader.exec_module(mod)
step_bytes = mod.generate(spec)

sig = inspect.signature(mod.generate)
if len(sig.parameters) >= 2:
from forge.sdk.llm import LLMClient
llm = LLMClient()
step_bytes = mod.generate(spec, llm)
else:
step_bytes = mod.generate(spec)
except Exception as exc:
print(f"{type(exc).__name__}: {exc}", file=sys.stderr)
sys.exit(1)
Expand Down
9 changes: 8 additions & 1 deletion cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,8 +472,15 @@ def _run_evaluate(agent_path: str, spec_path: str, verbose: bool) -> dict:
"--spec", spec_path,
"--json",
]
# Inherit environment; supply defaults so LLM agents work without extra setup.
env = os.environ.copy()
env.setdefault("FORGE_MODEL", "anthropic/claude-haiku-4-5")
env.setdefault(
"FORGE_MODEL_WHITELIST",
"anthropic/claude-haiku-4-5,anthropic/claude-3-5-haiku,openai/gpt-4o-mini",
)
try:
proc = subprocess.run(cmd, capture_output=True, text=True, cwd=str(ROOT))
proc = subprocess.run(cmd, capture_output=True, text=True, cwd=str(ROOT), env=env)
except FileNotFoundError:
return {"passed": False, "stage": "error", "reason": "benchmark module not found — run from repo root"}

Expand Down
Loading
Loading