The picture is the spec. SpecDD and TestDD only run on the mockup you approved. 144 Opus 4.7 agents · zero third-party services · two human clicks.
TDD drove code with tests. SpecDD drove code with specs.
We put PreviewDD in front. Mockup-first, eyes-first decision-making —
144 Opus 4.7 agents turn one line of idea into a frozen full-stack app
with only two human clicks.
You start any project without knowing what will get built. Specs go stale. Wireframes lie. By demo day half the assumptions were wrong.
Preview-Driven Development (PDD) flips it. Before any spec, before any code, the harness renders the project as 9 to 26 different mockups in parallel — each by a different Opus 4.7 persona pulling your idea in a different direction. You see what could be built. You select one.
Preview is all you need. The selection IS the spec.
| Artifact | Link |
|---|---|
| 🎥 Demo video | https://www.youtube.com/watch?v=_xHL8SZqfyI (2:59) — full walkthrough, problem statement through frozen app |
| 💻 Repository | Two-Weeks-Team/PreviewForgeForClaudeCode |
| 📝 Written summary (100–200 words) | See TL;DR below |
| 📜 License | Apache-2.0 — fully open-source per hackathon rules |
| 👥 Team | Two-Weeks-Team (≤2 members per rules) |
| 🆕 New work only | Built from scratch during the hackathon window (Apr 21–28, 2026). See CHANGELOG. |
Preview Forge turns one line of idea into a frozen, deployable full-stack app — by inverting the order of software development.
TDD drove code with tests. SpecDD drove code with specs. Preview Forge puts PreviewDD in front: before any spec or code is written, 26 Claude Opus 4.7 agents diverge into 26 single-file HTML mockups. You pick one with your eyes at Gate H1 (one human click). The picture becomes the contract every downstream agent honors.
The plugin runs 144 Opus 4.7 sub-agents organized into a 6-tier engineering organization (Ideation · Panels · Spec · Engineering · QA · Judges · Auditors), wired together by 15
/pf:*slash commands and a 4-layer cross-run memory (Reflexion pattern). SpecDD and TestDD then drive the build to a freeze threshold of ≥499/500. Two human clicks total — H1 (design pick), H2 (ship).Built entirely on Anthropic-native primitives — Opus 4.7, Managed Agents, Memory Tool, Batch API, Files API, Context editing, Prompt caching, Claude Design. No third-party services in the plugin runtime. Apache-2.0 licensed.
Preview is all you need.
flowchart LR
A["💡 One-line idea"] --> B["I1 Socratic interview<br/>4 required Q"]
B --> C["① PreviewDD<br/>26 mockups diverge"]
C --> D{{"🔒 Gate H1<br/>(human, 1 click)"}}
D --> E["② SpecDD<br/>OpenAPI + nestia"]
E --> F["③ TestDD<br/>Tests + Judges + Auditors"]
F --> G{{"🚀 Gate H2<br/>(human, 1 click)"}}
G --> H["📦 Frozen full-stack app"]
style C fill:#d4a574,stroke:#7aa6c2,color:#000
style E fill:#7aa6c2,stroke:#7aa6c2,color:#000
style F fill:#84c984,stroke:#7aa6c2,color:#000
| Cycle | Drives | Locked artifact |
|---|---|---|
| ① Preview-Driven Development (PreviewDD) (new) | 26 mockups before any spec | chosen_preview.json + mockups/chosen.html |
| ② Spec-Driven Development (SpecDD) | OpenAPI drives implementation | specs/openapi.yaml + SHA-256 .lock |
| ③ Test-Driven Development (TestDD) | Score ≥499/500 to freeze | score/report.json + .frozen-hash |
All three cycles follow diverge → aggregate → lock. Two human gates, otherwise autonomous. Full v8.0 specification — 2,100+ lines, single HTML file.
You type one line. The plugin doesn't dispatch 26 advocates immediately — it asks
4 required questions (5–8 optional) to capture target persona, primary surface,
killer feature, and must-have constraints. The answers compile to idea.spec.json —
a structured ground truth every downstream agent honors.
"build a fun, cheerful lunch recommender for office workers"
│
▼
┌──────────────────────────────────────────────────┐
│ I1 Idea Clarifier — 4 batched AskUserQuestion │
│ • target_persona • primary_surface │
│ • killer_feature • must_have_constraints │
└──────────────────────────────────────────────────┘
│
▼ idea.spec.json (the picture's contract)
┌──────────────────────────────────────────────────┐
│ 26 advocates diverge → gallery → you pick one │
└──────────────────────────────────────────────────┘
Why it matters. Before v1.6, the same one-liner could mean "Slack bot" or "legal-deposition paralegal." Same words, different products. The Socratic interview makes divergence intentional creative reframing, not blind misalignment. Skip-interview is one click if you want the demo escape hatch.
The contract is the picture selected at Gate H1. Drift is detected by the
Rule 9 idea-drift sentinel (hooks/idea-drift-detector.py) — block
threshold 0.3, warn at 0.4. If the build wanders away from the approved mockup,
the run pauses.
The harness operates under ten contracts that no agent (including the
supervisor) can override. They are enforced by Layer-0 hooks
(PreToolUse / PostToolUse / Stop / SubagentStop). Layer-0 started at
seven rules in v1.0.0 and has grown to ten as the harness shipped (most
recent additions: Rule 9 idea-drift, Rule 10 English-only output).
- Two human gates only — H1 (design pick) and H2 (ship). Everything else is autonomous.
- Scope discipline — agents may not exceed their declared scope (department / file / phase).
- Single source of truth per phase — each phase locks one artifact (
idea.spec.json,chosen_preview.json,specs/openapi.yaml,score/report.json). - Adaptive thinking +
xhigheffort wherever the action is one-shot and irreversible (freeze, deploy, schema lock). - Cost-regression sentinel —
hooks/cost-regression.pypauses the run when token usage crosses the active profile's hard ceiling. - Two ways to ask — anchored gates the user knows are coming (H1, H2) and adaptive asks the harness fires on its own (Socratic, budget guard). Everything else: auto-decide.
- Audit trail — every agent decision writes to the SQLite blackboard; runs are deterministically replayable from
trace.jsonl. - All Opus 4.7 — every agent fixed to
claude-opus-4-7; no Sonnet or Haiku fallback for plugin runtime. - Idea-drift detection —
hooks/idea-drift-detector.pyblocks runs where SpecDD wanders away from the H1 selection (block 0.3, warn 0.4). - Output language English — every artifact in the repo is English. Korean and other languages are permitted only as visual subtitles in the captured video.
Full Layer-0 specification — gates, scope, drift, output policy.
# 1. Add this marketplace
/plugin marketplace add Two-Weeks-Team/PreviewForgeForClaudeCode
# 2. Install the plugin
/plugin install pf@two-weeks-team
# 3. Reload
/reload-plugins
# 4. Initialize memory + workspace permissions (first time per workspace)
/pf:bootstrap
# 5. Run (profile defaults to `standard` as of v1.4.0)
/pf:new "your one-line idea"
# …or pick a profile explicitly:
/pf:new "demo-class idea" --profile=standard # default — ~60k tok · 2×5 eng · 9 previews · SQLite · no Docker
/pf:new "real project" --profile=pro # ~250k tok · 3×5 eng · 18 previews · Postgres + Docker
/pf:new "production launch" --profile=max # ~600k tok · 5×5 eng · 26 previews · full CI/CD| Profile | Previews | Eng teams | DB | Container | Panels | SCC iter | P95 ceiling | Use for |
|---|---|---|---|---|---|---|---|---|
| standard (default) | 9 | 2×5 (BE+FE) | SQLite | ❌ none | keyword-trigger | 3 | ~60k tok / 25 min | Local MVP · demo · prototyping |
| pro | 18 | 3×5 (+DB) | Postgres (dev-prod parity) | Docker + compose | keyword-trigger + escalation | 4 | ~250k tok / 70 min | Real projects |
| max | 26 | 5×5 (all) | Postgres | Docker + CI/CD | always-on | 5 | ~600k tok / 160 min | Production · baselines |
--previews=Noverrides the count (bounded bymax_user_expand= 26).--no-cachebypasses the PreviewDD-level cache (7 days for standard/pro, never cached for max).- Standard = local-first:
npm install && npm run db:push && npm run dev— no Docker, no Postgres setup. DB lives at~/.preview-forge/<project>/dev.db(outside repo tree for security). - Upgrade path: standard → pro via
bash scripts/graduate.sh pro(additive; keeps your code, adds Dockerfile/compose/Postgres datasource). - Full spec:
plugins/preview-forge/profiles/.
Profile escalation & cost-regression sentinel (v1.3+)
When you run standard but your idea mentions enterprise signals (Stripe, PII, HIPAA, SSO provider, SOC2, multi-tenant), the plugin recommends the right profile before PreviewDD burns tokens.
Evaluation precedence (highest wins):
- Hard-require (Stripe / PII / HIPAA / auth-provider): any single hit forces upgrade. You cannot dismiss — false assurance is worse than friction. The
min_distinct_categories=2floor does NOT apply here. - Soft-suggest + category-floor (SOC2 / multi-tenant / B2B / scale): needs ≥2 distinct categories AND score ≥ threshold to ask via AskUserQuestion. Records your answer in
~/.preview-forge/escalation-history.json. If you decline, same signals won't re-prompt within 24h (anti-nagging). - Hint (weak signals, score < threshold but ≥ min-floor): shows "💡 Consider --profile=pro next time" in
/pf:status, no interruption.
Categorical scoring (not raw keyword count) means "audit logging feature" in a generic marketing copy app won't false-positive.
Cost regression + drift detection. hooks/idea-drift-detector.py catches the failure where Gate H1 picks product A but SpecDD/Engineering drift to product B. Containment coefficient over token sets (no external ML deps). Block threshold 0.3, warn at 0.4. The P0-B cost-regression sentinel (hooks/cost-regression.py) compares cost-snapshot.json against the active profile's P95/hard ceiling every 30s. Hard breach triggers auto-pause + AskUserQuestion handoff.
What's new — v1.6 / v1.7 / v1.14 (shipped through semver v1.10.0+)
Terminology: "v1.6 audit" / "v1.7 audit" are feature umbrella names (issue #28 family / #29–#37). Each PR ships under its own release-please semver tag — the v1.6 schema landed in semver v1.6.0, B-1/B-3/A-4 (Phase 9, PR #51) landed in v1.10.0, etc. See CHANGELOG.md.
v1.6 — Socratic interview as ground truth (LESSON 0.7 fix). Before v1.6, 26 Advocates dispatched directly from the one-liner — and the failure mode in LESSON 0.7 played out: a one-liner could mean different products to different agents. v1.6 adds I1 Idea Clarifier between /pf:new and the 26 advocates. Three batched AskUserQuestion modals (10–12 fields total) produce idea.spec.json — structured ground truth (target_persona, primary_surface, jobs_to_be_done, killer_feature, must_have_constraints, non_goals, …) that every advocate receives. The PreviewDD cache key now includes idea_spec_hash, so the same one-liner with different Socratic answers gets a fresh advocate set.
v1.7 — 4 required questions, skip-interview, tiered fallback (Christensen + Kim-Mauborgne + Taleb). Hackathon demo feedback: 12 questions before seeing any output is too many. v1.7 trims:
- B-1 — 4 required, 5–8 optional. Best path: 4 clicks to gallery. Fullest path: 12 questions for deep dive.
- B-3 — Skip-interview button in Batch A. One click writes a 3-field stub and short-circuits to the v1.5.4 raw-idea path.
- A-4 —
_filled_ratiotiered fallback. The hard 0.5 gate is gone.≥0.7= high-confidence ground truth,0.4–0.7= hint,0.2–0.4= low-confidence,<0.2= drop spec entirely.
Why "gallery-first." The flow inverts the SaaS-onboarding default of "configure → preview." Instead: answer 4 questions → see 9 / 18 / 26 mockups → pick one. The picture is the spec. SpecDD and TestDD only run on the picture you approved. (Godin: lead with the artifact, not the form.)
v1.14 — post-gate automation. H1 now auto-advances to SpecDD once
chosen_preview.json.lock and design-approved.json exist. H2 now
auto-launches the local preview server after ship approval, and /pf:preview
handles manual re-open, stop, and status.
Updating & downgrading
# Check installed version
claude plugin list | grep -A2 pf@two-weeks-team
# Pull the latest manifest + plugin contents from the marketplace
/plugin marketplace update two-weeks-team
# Upgrade the plugin to the newest listed version
/plugin update pf@two-weeks-team # if you have this subcommand
# — or, if update is not available in your Claude Code version —
/plugin uninstall pf@two-weeks-team
/plugin install pf@two-weeks-team
# Reload so hooks, agents, and commands refresh
/reload-pluginsAfter updating, run pf check (or /pf:bootstrap once, then pf check) to confirm your local ~/.claude/preview-forge/memory/ is still intact — the update does not overwrite your LESSONS.md.
Downgrading:
/plugin uninstall pf@two-weeks-team
/plugin install pf@two-weeks-team@1.0.0 # any past version tagEvery release is signed via GitHub Releases.
Preview Forge ships 15 slash commands under the /pf:* namespace:
| Command | Purpose |
|---|---|
/pf:bootstrap |
Initialize plugin memory + seed workspace Bash permissions — first time per workspace |
/pf:new <idea> |
Start a new run (PreviewDD cycle begins) |
/pf:status |
Current run state, agent progress, blackboard |
/pf:retry <agent|phase> |
Rerun a failed agent or stuck phase |
/pf:freeze |
Force Judges + Auditors evaluation (TestDD Stage 7) |
/pf:preview [run] |
Re-open, stop, or inspect the local preview server for a frozen run (auto-launched after H2) |
| Command | Purpose |
|---|---|
/pf:design |
Gate H1 — Claude Design main / built-in Studio fallback |
/pf:panel |
Manually trigger 4-Panel (TP/BP/UP/RP) vote |
| Command | Purpose |
|---|---|
/pf:gallery |
Browse / fork past runs |
/pf:replay <run> |
Deterministic replay from trace.jsonl |
/pf:seed |
Pre-verified demo idea bank (10) |
/pf:export <run> |
Package frozen run as tarball or Claude Code plugin |
| Command | Purpose |
|---|---|
/pf:budget |
Cost dashboard — per-run / per-cycle / per-agent |
/pf:lessons |
Cross-run failure catalog (LESSONS.md) |
/pf:help |
Full 15-command reference + FAQ |
Preview Forge's 144 agents live in a 6-tier hierarchy + SQLite blackboard:
M1 Run Supervisor (Meta)
│
┌────────────────┼────────────────┐
│ │ │
M2 Cost Monitor M3 Chief Eng PM Software-Factory
(tracking only) (all dept leads) Layer-0 Hooks
│
┌──────────┬───────────────┼────────────────┬─────────────┐
│ │ │ │ │
Ideation 4 Panels + Spec Dept 5 Engineering QA Dept +
Dept Mitigation (9) Teams (25) SCC + Judges +
(29) Designer (45) Auditors + Docs
(33)
Count: 3 Meta + 29 Ideation + 45 Panels + 9 Spec + 25 Engineering + 14 QA + 6 SCC + 5 Judges + 5 Auditors + 3 Docs = 144. All Opus 4.7, zero Sonnet/Haiku.
- Claude Code (latest) with Pro / Max / Team / Enterprise subscription. (No separate API key needed.)
- Node.js 20 LTS + pnpm 9 (for scaffolded apps' build/test)
- Docker 24+ (optional, for scaffolded apps'
docker compose upverification)
| Area | Count | Summary |
|---|---|---|
| Agents | 144 | 10 departments, 6 tiers, all Opus 4.7 |
| Slash commands | 15 | /pf:* namespace |
| Hooks | 7 | factory-policy, askuser-enforcement, auto-retro-trigger, idea-drift-detector, cost-regression, escalation-ledger, post-h1-signal |
| Memory seed | 3 | CLAUDE.md + PROGRESS.md + LESSONS.md (with 3 bootstrap lessons) |
| Methodology | 1 | Layer-0 7 non-negotiable rules |
| Asset templates | 12 | Docker Compose, Caddyfile, nestia.config.ts, install.sh + 8 standard-profile build templates |
| JSON schemas | 6 | preview-card, panel-vote, score-report, pf-profile, idea-spec, spec-anchor-audit |
| Seed ideas | 10 | Pre-verified demo scenarios |
| CLI | 1 | bin/pf |
| Verification | 1 | scripts/verify-plugin.sh |
Preview Forge's plugin runtime uses only Anthropic-native services:
- Claude Code (Pro/Max) · Claude Opus 4.7 · Adaptive thinking ·
xhigheffort - Claude Managed Agents · Anthropic Memory Tool · Batch API · Files API · Citations
- Context editing (
context-management-2025-06-27) · Compaction (compact_20260112) - Prompt caching (1-hour TTL) · Fine-grained tool streaming · Task budgets (
task-budgets-2026-03-13) - Claude Design (Gate H1 main) · Built-in Design Studio (Gate H1 fallback)
Not used in the plugin runtime or generated mockups: Figma, Google Fonts, external CDNs, hosted analytics services. All 26 mockups are single-file HTML with inline styles only.
A 4-layer memory so mistakes don't repeat across runs:
memory/CLAUDE.md— session rules (read first every run)memory/PROGRESS.md— run index (updated at run end)memory/LESSONS.md— failure catalog (auto-appended by Auto-retro critic)- Anthropic Memory Tool (
memory_20250818) — per-agent episodic memory (Reflexion pattern)
M1 Run Supervisor reads all four before every new run and pre-loads relevant lessons to every Department Lead.
- 📘 Full v8.0 Specification — canonical, 2,100+ lines
- 📝 CHANGELOG — phase-by-phase build log
- 🛡️ Security Policy — reporting and scope
- 🤝 Contributing — LESSONS, new advocates, etc.
- 🪶 Layer-0 Rules — gates, scope, drift, and output policy
git clone https://github.com/Two-Weeks-Team/PreviewForgeForClaudeCode
cd PreviewForgeForClaudeCode
bash scripts/verify-plugin.shApache-2.0. See NOTICE for attribution.
Built with Claude Opus 4.7 · Powered by Claude Code Plugins · No third-party services in the plugin runtime · Apache-2.0
Preview Forge · Two-Weeks-Team
Preview is all you need.
