Skip to content

Two-Weeks-Team/PreviewForgeForClaudeCode

Preview Forge for Claude Code

Preview is all you need.

One line of idea → 26 AI-generated previews → pick with your eyes → frozen full-stack app.

Gate H1 gallery — 9 advocate-rendered mockups for the LunchPull demo run

The picture is the spec. SpecDD and TestDD only run on the mockup you approved. 144 Opus 4.7 agents · zero third-party services · two human clicks.

CI Marketplace Validate Pages Release License: Apache 2.0

Built with Opus 4.7 Claude Code Plugin 144 Agents 3-DD Methodology 15 Slash Commands Stars

TDD drove code with tests. SpecDD drove code with specs. We put PreviewDD in front. Mockup-first, eyes-first decision-making — 144 Opus 4.7 agents turn one line of idea into a frozen full-stack app with only two human clicks.


The problem

You start any project without knowing what will get built. Specs go stale. Wireframes lie. By demo day half the assumptions were wrong.

Preview-Driven Development (PDD) flips it. Before any spec, before any code, the harness renders the project as 9 to 26 different mockups in parallel — each by a different Opus 4.7 persona pulling your idea in a different direction. You see what could be built. You select one.

Preview is all you need. The selection IS the spec.


Submission — Built with Opus 4.7 hackathon

Artifact Link
🎥 Demo video https://www.youtube.com/watch?v=_xHL8SZqfyI (2:59) — full walkthrough, problem statement through frozen app
💻 Repository Two-Weeks-Team/PreviewForgeForClaudeCode
📝 Written summary (100–200 words) See TL;DR below
📜 License Apache-2.0 — fully open-source per hackathon rules
👥 Team Two-Weeks-Team (≤2 members per rules)
🆕 New work only Built from scratch during the hackathon window (Apr 21–28, 2026). See CHANGELOG.

TL;DR

Preview Forge turns one line of idea into a frozen, deployable full-stack app — by inverting the order of software development.

TDD drove code with tests. SpecDD drove code with specs. Preview Forge puts PreviewDD in front: before any spec or code is written, 26 Claude Opus 4.7 agents diverge into 26 single-file HTML mockups. You pick one with your eyes at Gate H1 (one human click). The picture becomes the contract every downstream agent honors.

The plugin runs 144 Opus 4.7 sub-agents organized into a 6-tier engineering organization (Ideation · Panels · Spec · Engineering · QA · Judges · Auditors), wired together by 15 /pf:* slash commands and a 4-layer cross-run memory (Reflexion pattern). SpecDD and TestDD then drive the build to a freeze threshold of ≥499/500. Two human clicks total — H1 (design pick), H2 (ship).

Built entirely on Anthropic-native primitives — Opus 4.7, Managed Agents, Memory Tool, Batch API, Files API, Context editing, Prompt caching, Claude Design. No third-party services in the plugin runtime. Apache-2.0 licensed.

Preview is all you need.

The 3-DD methodology

flowchart LR
    A["💡 One-line idea"] --> B["I1 Socratic interview<br/>4 required Q"]
    B --> C["① PreviewDD<br/>26 mockups diverge"]
    C --> D{{"🔒 Gate H1<br/>(human, 1 click)"}}
    D --> E["② SpecDD<br/>OpenAPI + nestia"]
    E --> F["③ TestDD<br/>Tests + Judges + Auditors"]
    F --> G{{"🚀 Gate H2<br/>(human, 1 click)"}}
    G --> H["📦 Frozen full-stack app"]

    style C fill:#d4a574,stroke:#7aa6c2,color:#000
    style E fill:#7aa6c2,stroke:#7aa6c2,color:#000
    style F fill:#84c984,stroke:#7aa6c2,color:#000
Loading
Cycle Drives Locked artifact
Preview-Driven Development (PreviewDD) (new) 26 mockups before any spec chosen_preview.json + mockups/chosen.html
Spec-Driven Development (SpecDD) OpenAPI drives implementation specs/openapi.yaml + SHA-256 .lock
Test-Driven Development (TestDD) Score ≥499/500 to freeze score/report.json + .frozen-hash

All three cycles follow diverge → aggregate → lock. Two human gates, otherwise autonomous. Full v8.0 specification — 2,100+ lines, single HTML file.

From one prompt to a gallery — in 4 questions

You type one line. The plugin doesn't dispatch 26 advocates immediately — it asks 4 required questions (5–8 optional) to capture target persona, primary surface, killer feature, and must-have constraints. The answers compile to idea.spec.json — a structured ground truth every downstream agent honors.

"build a fun, cheerful lunch recommender for office workers"
        │
        ▼
┌──────────────────────────────────────────────────┐
│  I1 Idea Clarifier — 4 batched AskUserQuestion    │
│  • target_persona   • primary_surface             │
│  • killer_feature   • must_have_constraints       │
└──────────────────────────────────────────────────┘
        │
        ▼  idea.spec.json   (the picture's contract)
┌──────────────────────────────────────────────────┐
│  26 advocates diverge → gallery → you pick one    │
└──────────────────────────────────────────────────┘

Why it matters. Before v1.6, the same one-liner could mean "Slack bot" or "legal-deposition paralegal." Same words, different products. The Socratic interview makes divergence intentional creative reframing, not blind misalignment. Skip-interview is one click if you want the demo escape hatch.

Pixel-faithful delivery

The contract is the picture selected at Gate H1. Drift is detected by the Rule 9 idea-drift sentinel (hooks/idea-drift-detector.py) — block threshold 0.3, warn at 0.4. If the build wanders away from the approved mockup, the run pauses.

Layer-0 — the ten non-negotiable rules

The harness operates under ten contracts that no agent (including the supervisor) can override. They are enforced by Layer-0 hooks (PreToolUse / PostToolUse / Stop / SubagentStop). Layer-0 started at seven rules in v1.0.0 and has grown to ten as the harness shipped (most recent additions: Rule 9 idea-drift, Rule 10 English-only output).

  1. Two human gates only — H1 (design pick) and H2 (ship). Everything else is autonomous.
  2. Scope discipline — agents may not exceed their declared scope (department / file / phase).
  3. Single source of truth per phase — each phase locks one artifact (idea.spec.json, chosen_preview.json, specs/openapi.yaml, score/report.json).
  4. Adaptive thinking + xhigh effort wherever the action is one-shot and irreversible (freeze, deploy, schema lock).
  5. Cost-regression sentinelhooks/cost-regression.py pauses the run when token usage crosses the active profile's hard ceiling.
  6. Two ways to ask — anchored gates the user knows are coming (H1, H2) and adaptive asks the harness fires on its own (Socratic, budget guard). Everything else: auto-decide.
  7. Audit trail — every agent decision writes to the SQLite blackboard; runs are deterministically replayable from trace.jsonl.
  8. All Opus 4.7 — every agent fixed to claude-opus-4-7; no Sonnet or Haiku fallback for plugin runtime.
  9. Idea-drift detectionhooks/idea-drift-detector.py blocks runs where SpecDD wanders away from the H1 selection (block 0.3, warn 0.4).
  10. Output language English — every artifact in the repo is English. Korean and other languages are permitted only as visual subtitles in the captured video.

Full Layer-0 specification — gates, scope, drift, output policy.

Quick install

# 1. Add this marketplace
/plugin marketplace add Two-Weeks-Team/PreviewForgeForClaudeCode

# 2. Install the plugin
/plugin install pf@two-weeks-team

# 3. Reload
/reload-plugins

# 4. Initialize memory + workspace permissions (first time per workspace)
/pf:bootstrap

# 5. Run (profile defaults to `standard` as of v1.4.0)
/pf:new "your one-line idea"

# …or pick a profile explicitly:
/pf:new "demo-class idea"     --profile=standard   # default — ~60k tok · 2×5 eng · 9 previews · SQLite · no Docker
/pf:new "real project"        --profile=pro         # ~250k tok · 3×5 eng · 18 previews · Postgres + Docker
/pf:new "production launch"   --profile=max         # ~600k tok · 5×5 eng · 26 previews · full CI/CD

Profiles (v1.4+)

Profile Previews Eng teams DB Container Panels SCC iter P95 ceiling Use for
standard (default) 9 2×5 (BE+FE) SQLite ❌ none keyword-trigger 3 ~60k tok / 25 min Local MVP · demo · prototyping
pro 18 3×5 (+DB) Postgres (dev-prod parity) Docker + compose keyword-trigger + escalation 4 ~250k tok / 70 min Real projects
max 26 5×5 (all) Postgres Docker + CI/CD always-on 5 ~600k tok / 160 min Production · baselines
  • --previews=N overrides the count (bounded by max_user_expand = 26).
  • --no-cache bypasses the PreviewDD-level cache (7 days for standard/pro, never cached for max).
  • Standard = local-first: npm install && npm run db:push && npm run dev — no Docker, no Postgres setup. DB lives at ~/.preview-forge/<project>/dev.db (outside repo tree for security).
  • Upgrade path: standard → pro via bash scripts/graduate.sh pro (additive; keeps your code, adds Dockerfile/compose/Postgres datasource).
  • Full spec: plugins/preview-forge/profiles/.
Profile escalation & cost-regression sentinel (v1.3+)

When you run standard but your idea mentions enterprise signals (Stripe, PII, HIPAA, SSO provider, SOC2, multi-tenant), the plugin recommends the right profile before PreviewDD burns tokens.

Evaluation precedence (highest wins):

  1. Hard-require (Stripe / PII / HIPAA / auth-provider): any single hit forces upgrade. You cannot dismiss — false assurance is worse than friction. The min_distinct_categories=2 floor does NOT apply here.
  2. Soft-suggest + category-floor (SOC2 / multi-tenant / B2B / scale): needs ≥2 distinct categories AND score ≥ threshold to ask via AskUserQuestion. Records your answer in ~/.preview-forge/escalation-history.json. If you decline, same signals won't re-prompt within 24h (anti-nagging).
  3. Hint (weak signals, score < threshold but ≥ min-floor): shows "💡 Consider --profile=pro next time" in /pf:status, no interruption.

Categorical scoring (not raw keyword count) means "audit logging feature" in a generic marketing copy app won't false-positive.

Cost regression + drift detection. hooks/idea-drift-detector.py catches the failure where Gate H1 picks product A but SpecDD/Engineering drift to product B. Containment coefficient over token sets (no external ML deps). Block threshold 0.3, warn at 0.4. The P0-B cost-regression sentinel (hooks/cost-regression.py) compares cost-snapshot.json against the active profile's P95/hard ceiling every 30s. Hard breach triggers auto-pause + AskUserQuestion handoff.

What's new — v1.6 / v1.7 / v1.14 (shipped through semver v1.10.0+)

Terminology: "v1.6 audit" / "v1.7 audit" are feature umbrella names (issue #28 family / #29–#37). Each PR ships under its own release-please semver tag — the v1.6 schema landed in semver v1.6.0, B-1/B-3/A-4 (Phase 9, PR #51) landed in v1.10.0, etc. See CHANGELOG.md.

v1.6 — Socratic interview as ground truth (LESSON 0.7 fix). Before v1.6, 26 Advocates dispatched directly from the one-liner — and the failure mode in LESSON 0.7 played out: a one-liner could mean different products to different agents. v1.6 adds I1 Idea Clarifier between /pf:new and the 26 advocates. Three batched AskUserQuestion modals (10–12 fields total) produce idea.spec.json — structured ground truth (target_persona, primary_surface, jobs_to_be_done, killer_feature, must_have_constraints, non_goals, …) that every advocate receives. The PreviewDD cache key now includes idea_spec_hash, so the same one-liner with different Socratic answers gets a fresh advocate set.

v1.7 — 4 required questions, skip-interview, tiered fallback (Christensen + Kim-Mauborgne + Taleb). Hackathon demo feedback: 12 questions before seeing any output is too many. v1.7 trims:

  • B-1 — 4 required, 5–8 optional. Best path: 4 clicks to gallery. Fullest path: 12 questions for deep dive.
  • B-3 — Skip-interview button in Batch A. One click writes a 3-field stub and short-circuits to the v1.5.4 raw-idea path.
  • A-4_filled_ratio tiered fallback. The hard 0.5 gate is gone. ≥0.7 = high-confidence ground truth, 0.4–0.7 = hint, 0.2–0.4 = low-confidence, <0.2 = drop spec entirely.

Why "gallery-first." The flow inverts the SaaS-onboarding default of "configure → preview." Instead: answer 4 questions → see 9 / 18 / 26 mockups → pick one. The picture is the spec. SpecDD and TestDD only run on the picture you approved. (Godin: lead with the artifact, not the form.)

v1.14 — post-gate automation. H1 now auto-advances to SpecDD once chosen_preview.json.lock and design-approved.json exist. H2 now auto-launches the local preview server after ship approval, and /pf:preview handles manual re-open, stop, and status.

Updating & downgrading
# Check installed version
claude plugin list | grep -A2 pf@two-weeks-team

# Pull the latest manifest + plugin contents from the marketplace
/plugin marketplace update two-weeks-team

# Upgrade the plugin to the newest listed version
/plugin update pf@two-weeks-team     # if you have this subcommand
#   — or, if update is not available in your Claude Code version —
/plugin uninstall pf@two-weeks-team
/plugin install pf@two-weeks-team

# Reload so hooks, agents, and commands refresh
/reload-plugins

After updating, run pf check (or /pf:bootstrap once, then pf check) to confirm your local ~/.claude/preview-forge/memory/ is still intact — the update does not overwrite your LESSONS.md.

Downgrading:

/plugin uninstall pf@two-weeks-team
/plugin install pf@two-weeks-team@1.0.0    # any past version tag

Every release is signed via GitHub Releases.

Slash Commands

Preview Forge ships 15 slash commands under the /pf:* namespace:

🚀 Run lifecycle

Command Purpose
/pf:bootstrap Initialize plugin memory + seed workspace Bash permissions — first time per workspace
/pf:new <idea> Start a new run (PreviewDD cycle begins)
/pf:status Current run state, agent progress, blackboard
/pf:retry <agent|phase> Rerun a failed agent or stuck phase
/pf:freeze Force Judges + Auditors evaluation (TestDD Stage 7)
/pf:preview [run] Re-open, stop, or inspect the local preview server for a frozen run (auto-launched after H2)

🗳️ Decision gates

Command Purpose
/pf:design Gate H1 — Claude Design main / built-in Studio fallback
/pf:panel Manually trigger 4-Panel (TP/BP/UP/RP) vote

📚 Assets & history

Command Purpose
/pf:gallery Browse / fork past runs
/pf:replay <run> Deterministic replay from trace.jsonl
/pf:seed Pre-verified demo idea bank (10)
/pf:export <run> Package frozen run as tarball or Claude Code plugin

📊 Observability

Command Purpose
/pf:budget Cost dashboard — per-run / per-cycle / per-agent
/pf:lessons Cross-run failure catalog (LESSONS.md)
/pf:help Full 15-command reference + FAQ

Agent Organization

Preview Forge's 144 agents live in a 6-tier hierarchy + SQLite blackboard:

                        M1 Run Supervisor (Meta)
                               │
              ┌────────────────┼────────────────┐
              │                │                │
      M2 Cost Monitor     M3 Chief Eng PM   Software-Factory
       (tracking only)  (all dept leads)   Layer-0 Hooks
                               │
    ┌──────────┬───────────────┼────────────────┬─────────────┐
    │          │               │                │             │
 Ideation  4 Panels +       Spec Dept     5 Engineering     QA Dept +
  Dept      Mitigation       (9)          Teams (25)        SCC + Judges +
  (29)     Designer (45)                                    Auditors + Docs
                                                                (33)

Count: 3 Meta + 29 Ideation + 45 Panels + 9 Spec + 25 Engineering + 14 QA + 6 SCC + 5 Judges + 5 Auditors + 3 Docs = 144. All Opus 4.7, zero Sonnet/Haiku.

Requirements

  • Claude Code (latest) with Pro / Max / Team / Enterprise subscription. (No separate API key needed.)
  • Node.js 20 LTS + pnpm 9 (for scaffolded apps' build/test)
  • Docker 24+ (optional, for scaffolded apps' docker compose up verification)

What's inside the plugin

Area Count Summary
Agents 144 10 departments, 6 tiers, all Opus 4.7
Slash commands 15 /pf:* namespace
Hooks 7 factory-policy, askuser-enforcement, auto-retro-trigger, idea-drift-detector, cost-regression, escalation-ledger, post-h1-signal
Memory seed 3 CLAUDE.md + PROGRESS.md + LESSONS.md (with 3 bootstrap lessons)
Methodology 1 Layer-0 7 non-negotiable rules
Asset templates 12 Docker Compose, Caddyfile, nestia.config.ts, install.sh + 8 standard-profile build templates
JSON schemas 6 preview-card, panel-vote, score-report, pf-profile, idea-spec, spec-anchor-audit
Seed ideas 10 Pre-verified demo scenarios
CLI 1 bin/pf
Verification 1 scripts/verify-plugin.sh

Zero third-party services

Preview Forge's plugin runtime uses only Anthropic-native services:

  • Claude Code (Pro/Max) · Claude Opus 4.7 · Adaptive thinking · xhigh effort
  • Claude Managed Agents · Anthropic Memory Tool · Batch API · Files API · Citations
  • Context editing (context-management-2025-06-27) · Compaction (compact_20260112)
  • Prompt caching (1-hour TTL) · Fine-grained tool streaming · Task budgets (task-budgets-2026-03-13)
  • Claude Design (Gate H1 main) · Built-in Design Studio (Gate H1 fallback)

Not used in the plugin runtime or generated mockups: Figma, Google Fonts, external CDNs, hosted analytics services. All 26 mockups are single-file HTML with inline styles only.

Memory & cross-run learning

A 4-layer memory so mistakes don't repeat across runs:

  1. memory/CLAUDE.md — session rules (read first every run)
  2. memory/PROGRESS.md — run index (updated at run end)
  3. memory/LESSONS.md — failure catalog (auto-appended by Auto-retro critic)
  4. Anthropic Memory Tool (memory_20250818) — per-agent episodic memory (Reflexion pattern)

M1 Run Supervisor reads all four before every new run and pre-loads relevant lessons to every Department Lead.

Documentation

Verify install

git clone https://github.com/Two-Weeks-Team/PreviewForgeForClaudeCode
cd PreviewForgeForClaudeCode
bash scripts/verify-plugin.sh

License

Apache-2.0. See NOTICE for attribution.


Built with Claude Opus 4.7 · Powered by Claude Code Plugins · No third-party services in the plugin runtime · Apache-2.0

Preview Forge · Two-Weeks-Team

Preview is all you need.