Game-creation skills + twine/webgamedev sandbox images#23
Merged
Conversation
Add two locally-built, lazily-built sandbox images following the browserimage/mediaimage pattern, backing offline game build+playtest: - twine (demesne-twine): Playwright/Chromium+Node base + the Tweego interactive-fiction compiler and bundled Twine story formats (Harlowe, SugarCube, ...), TWEEGO_PATH set; usable at egress=none. - webgamedev (demesne-webgamedev): same Playwright base + a warm Phaser+Vite+TypeScript template at /opt/game-template with node_modules pre-installed; usable at egress=none. Fast-moving versions (Tweego, Phaser, Vite, TypeScript, the Playwright tag/npm version) are build ARGs with pinned defaults that join the ImageBuilder content-hash cache key. Wires both into images.go (consts + allowedImageNames), runner.go resolveImage, the child and host tool-schema image descriptions, manifest.json, and all docs; adds two integration tests behind the integration build tag.
Two demesne creation skills that turn a concept into a finished, playable game and deliver a durable bundle the recipient can play and keep editing (artifact-as-product, not a repo branch): sandbox-make-twine-game compiles Twee 3 to a self-contained HTML via Tweego with a link-graph playtest; sandbox-make-ts-game scaffolds a Phaser+Vite+TypeScript project, builds it, and playtests it by driving scripted inputs. Both run an improvement-cycle model — correctness invariants (compiles, no broken links, canvas readies, no console errors) held every iteration, with quality pursued open-endedly until diminishing returns rather than passed at a minimum bar. Use the twine and webgamedev sandbox images added in this change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Running both skills end-to-end surfaced three issues, now fixed in both SKILL.md files: (1) a child's /out is isolated at /out/child/<name>, so children now write screenshots, reports, and per-phase summaries under the shared /workspace and the orchestrator relays them into /out — previously the per-step instructions told children to write to /out/gallery and /out/SUMMARY.md, stranding them. (2) sandbox_agent children (write, implement, reviewers, fix) must run background:true + sandbox_wait: a synchronous agent child that runs past the ~300s MCP idle timeout aborts on the orchestrator side while continuing to run, racing on shared /workspace. (3) sandbox-make-ts-game's fallback now names image=browser for the playtest (image=node has no browser) and corrects the npm-install cost (~10-20s, not glacial). Also clarifies that the Twine playtest harness must identify passages from runtime state, since Harlowe does not render the passage name in the DOM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Direct end-to-end test of the twine image showed Tweego rejects a format-version it doesn't bundle. The image ships Tweego 2.1.1's bundled formats (Harlowe 3.1.0, SugarCube 2.30.0, etc.), so the :: StoryData format-version must pin one of those — a newer Harlowe (3.3.x) fails the compile. Documented in the skill with how to list what's available. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tweego 2.1.1 bundles years-stale story formats (Harlowe 3.1.0, SugarCube 2.30.0). The twine Dockerfile now overlays the current format.js for each — Harlowe 3.3.9 and SugarCube 2.37.3, from the latest Twine 2 release (2.12.0) that ships them — onto Tweego's major-version format directories (harlowe-3, sugarcube-2), each pinned via an ARG with a build-time version assertion. The integration test now also asserts the overlaid Harlowe version is present, and the skill's version note is updated. Verified: the demesne-twine image rebuilds and the smoke test passes with Harlowe 3.3.9. webgamedev needs no change — its Phaser 4.2.0 / Vite 8.1.0 / TypeScript 6.0.3 are already the latest. The Playwright base stays 1.60.0 to keep sharing the cached base layer with the browser image (latest is 1.61.1; bumping it is a browser-image decision). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
All three Playwright-based images share the mcr.microsoft.com/playwright base, so they move together to keep that base layer cached once across them: base tag v1.60.0-noble -> v1.61.1-noble and the global playwright npm package 1.60.0 -> 1.61.1 (the current latest). Verified: all three images rebuild on the new base and their smoke tests pass (browser renders a React widget, twine compiles with Harlowe 3.3.9, webgamedev builds the template). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…review pipeline Generalises the skill from a one-shot small-game build to one that scales to a multi-system game. A game-design phase turns the brief into a detailed spec whose system decomposition is the phase plan (right-sizing the build), then each vertical-slice phase is built, build-checked, scenario-playtested, and reviewed before the next, with a final cross-system cohesion pass. Validation scales via a window.__game test API and a growing scenarios.json suite (the game analogue of build-widget's journey.json) re-run every phase. Folds in four fixes surfaced by an end-to-end platformer run: (1) the orchestrator scaffolds /opt/game-template via a webgamedev sandbox_script and implement agents edit src/ directly — the template lives only in the webgamedev image, so an agent image could not copy it; (2) one canonical scenario schema (name/setup/steps/check with a token vocabulary and an indirect-eval JS check) the design emits and the harness consumes; (3) the design decomposition must be dependency-consistent (a scenario's window.__game methods and state values introduced no later than its milestone, no self-contradictory checks); (4) a determinism-explicit reference playtest.cjs skeleton (manual rAF queue + virtual clock + seeded RNG, boot-by-stepping, indirect-eval check). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A second end-to-end platformer run validated the implement-via-agent path but found five harness issues, now folded into the skill: (1) the reference playtest.cjs skeleton shipped a broken eval — it bound g/state as evaluate-locals while using indirect eval, which runs in global scope and cannot see them, so every check threw ReferenceError; it now assigns g/state onto globalThis before the indirect eval. (2) The harness now reloads the page per scenario (page.goto inside the loop) so each starts from clean state and a clock reset to 0 — running the whole suite on one page let accumulated physics bodies and the virtual clock perturb later scenarios. (3) press:<input> is documented as a one-frame tap, so a scripted press:jump is a short hop and full jumps need hold:jump:N. (4) dependency-consistency now covers the other content a milestone introduces (a getEnemyCount()===0 check is unsatisfiable in a cavern that milestone fills with enemies), preferring outcome-class checks over brittle absolutes. (5) timing-sensitive scenarios are documented as needing a deterministic frame-probe and outcome-class assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds two non-coder creation skills that turn a concept into a finished, playable game and deliver a durable bundle (artifact-as-product, not a repo branch), plus the two baked sandbox images they run on. Toolchains pinned at current-latest.
Skills (
examples/skills/)sandbox-make-twine-game— branching interactive fiction: Twee 3 → Tweego compile → an offline link-graph playtest (every choice resolves, no orphan/unreachable passages, every path ends) → a story-quality improvement cycle. Delivers the editable.twee(imports into the Twine editor) + a self-contained playableindex.html.sandbox-make-ts-game— coded real-time game, design-first and phased: a game-design phase turns the brief into a detailed spec whose system decomposition is the phase plan (right-sizing from a one-mechanic toy to a multi-system game); each vertical slice is built, scenario-playtested, and reviewed before the next, with a final cross-system cohesion pass. Validation scales via awindow.__gametest API + a growingscenarios.jsonsuite (the game analogue of build-widget'sjourney.json). Delivers the editable TypeScript project + runnabledist/.Both separate correctness invariants (build/compile, no broken links, no console errors — held every iteration) from quality (pursued through an open-ended improvement cycle to diminishing returns, not a minimum bar).
Images (
internal/sandbox/)Two locally-built, lazily-built images following the
browserimage/mediaimagepattern, both on the shared Playwright/Chromium+Node base so it's cached once acrossbrowser/twine/webgamedev:twine— Tweego + the current Twine story formats (Harlowe 3.3.9, SugarCube 2.37.3, …) overlaid onto Tweego's stale bundled ones.webgamedev— a warm Phaser 4.2.0 + Vite 8.1.0 + TypeScript 6.0.3 template at/opt/game-template(node_modulesbaked), exposing#game-ready+data-game-statemarkers for offline playtests.ARGs folded into the content-hash cache key. Wiring kept in sync acrossimages.go, the runner,childserver.go/server.gotool schemas,manifest.json, thedocs/reference/toolsschemas,README/docs/CHANGELOG, plus unit-test parity and per-image//go:build integrationsmoke tests.Validation
make validategreen (build + golangci-lint + unit);make test-integrationbuilds all three images under real podman and passes their smoke tests (incl. asserting Harlowe 3.3.9 is in thetwineimage).webgamedevgames — including a 7-phase platformer ("Lantern Hollow") that validated the design-first decomposition, the implement-via-agent path, the growing scenario suite, and interspersed review (which caught real defects, e.g. a geometrically un-jumpable level gap). Friction those runs surfaced is folded back into the skills (scaffold ownership, one canonical scenario schema, a determinism-correct reference harness).🤖 Generated with Claude Code