Skip to content

Release: v2.36.0 — deterministic workflow engine#150

Merged
Data-Wise merged 19 commits into
mainfrom
dev
Jun 13, 2026
Merged

Release: v2.36.0 — deterministic workflow engine#150
Data-Wise merged 19 commits into
mainfrom
dev

Conversation

@Data-Wise

Copy link
Copy Markdown
Owner

v2.36.0 — deterministic workflow engine

Minor release. 18 commits since v2.35.0.

Added

  • /craft:orchestrate:workflow — a third orchestration mode that runs a coded, fixed-control-flow program (parallel/pipeline/loop/verify) with stdlib-enforced structural output schemas, data-driven fan-out, a run-wide concurrency semaphore, and cached/resumable replay by run-ID.
    • workflow-engine skill + scripts/workflow_parse.py (stdlib-only core).
    • task-analyzer suggests :workflow on decompose→cover→verify→synthesize phrasing.
    • Tutorial, command/help/refcard/cookbook docs + 2 mermaid flowcharts, runnable example.

Fixed

  • Test suite no longer mutates the source tree: command-audit.sh --fix (stripped deprecated/replaced-by) and docs-staleness-check.sh --fix (rewrote count strings) both retargeted to isolated temp trees + structural guard.
  • pytest --strict-markers no longer red-lists all of CI when the mermaid/mermaid_mcp marker-provider plugin flakes — both markers registered in pyproject.toml.
  • Homebrew cask generator emits the canonical bare-symbol depends_on macos: :codename form.
  • test_on_expected_branch_type accepts fix/* branches.

Counts

110 commands · 39 skills · 8 agents

Pre-flight

  • pre-release-check.sh 2.36.0: PASSED
  • dev CI green on the bump commit (verify below before merge)
  • CHANGELOG.md ↔ docs/CHANGELOG.md mirrored; [Unreleased] → [2.36.0] — 2026-06-13

🤖 Generated with Claude Code

Test User and others added 19 commits June 3, 2026 23:52
The generated cask template used the deprecated string-comparison form
`depends_on macos: ">= :{min_macos}"`, which Homebrew now warns against.
Switch to the canonical bare-symbol form `depends_on macos: :{min_macos}`
(min_macos is always a macOS codename, so the symbol is valid and already
means "this version or newer").

This is the root cause that reintroduced the deprecation into generated
casks (cf. Data-Wise/homebrew-tap#112). Also fixed in the reference SPEC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third orchestration mode /craft:orchestrate:workflow — code-driven
parallel/pipeline/loop control flow with schema-validated JSON per
agent. 8 design decisions locked via interrogation 2026-06-12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the full doc/discoverability surface ship-blocking, mirroring the
/craft:orchestrate:drive precedent (tutorial, command+help pages, REFCARD,
cookbook recipe, modes-compared 2->3, skills-agents catalog, mkdocs nav).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…output

brainstorm Step 5.5 now requires a 'Documentation & Discoverability'
section in every captured spec; orchestrate:plan template gains a
required docs phase + acceptance criterion. Mirrors the full doc surface
a shipped craft feature carries (tutorial, help, REFCARD, hub, website).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ot cause confirmed)

VALID_FIELDS omits deprecated/replaced-by; command-audit.sh --fix strips
them, and test_audit_fix_mode_dry runs --fix against the real tree.
Exact diffs + TDD plan for fix/command-audit-deprecated-fields.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…staleness --fix) (#147)

Bundles Bug A (command-audit --fix stripped deprecated/replaced-by from 56 commands) and Bug C (docs-staleness --fix rewrote count refs in 3 docs), both via tests invoking --fix against the real tree. Adds isolation helpers + enforcement test, fixes the fix/* branch-type test, and an intentional 108->109 count sync. All CI green; full suite 1672 passed / 0 failed, tree clean.
…; engine at Increment 5a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…:workflow (3rd mode) (#148)

* docs(plan): ORCHESTRATE plan for workflow engine (5 increments, FR9 docs surface)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(orchestrate): workflow-engine mechanical core — parser + schema (Increment 1)

Increment 1 of the deterministic workflow engine (SPEC-workflow-engine-2026-06-12).
TDD-built, stdlib-only determinism core (PyYAML used only for input reading,
imported lazily inside parse_yaml so the core never touches it).

scripts/workflow_parse.py:
- YAML form + frozen shape-DSL form -> identical canonical wave plan (D1, D3).
  DSL is a tokenizer + recursive-descent reader (no eval), with map/flatMap
  path-[] agreement enforced so both forms provably converge.
- Structural output validator (D2 layer 1, gating): required keys, primitives,
  homogeneous arrays; bool rejected as number; one error per offending key.
- resolve_fanout: empty fan-out hard-aborts naming the upstream stage (D6/FR8).
- cache_key content hash + cascade_invalidate downstream propagation (D4).
- Run-wide semaphore as a counter FILE — read/acquire/release/reconcile (D5/FR7).

commands/orchestrate/_workflow_schema.json: constrained v1 dialect (reference;
enforcement is the stdlib validator, no jsonschema).

tests/test_workflow_engine.py: 33 unit tests, red->green per decision (D1-D6).
Full `pytest tests/` green (1703 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(orchestrate): workflow-engine skill executor + CLI/gate hooks (Increment 2)

Increment 2 of the deterministic workflow engine — the prompt-driven executor
behind /craft:orchestrate:workflow (no runtime dep, like drive-engine).

skills/orchestration/workflow-engine/SKILL.md:
- Compile -> resolve fan-out -> dispatch file-scoped agents under the run-wide
  semaphore -> hybrid-gate every output -> cache/replay -> reconcile counter at
  each wave boundary (D5 residual-risk mitigation).
- Hard rule: delegate ALL deterministic mechanics to scripts/workflow_parse.py
  (plan, structural gate, cache key, cascade, fan-out, semaphore) — never
  improvise. Claude judges only the advisory semantic layer.
- First-class verify gate (D8): runs the project's real verify command, exit
  status authoritative; reuses drive-engine's auto-detection table so drive
  stays a strict subset.

scripts/workflow_parse.py:
- parse_file() + main() CLI: emits the canonical wave plan as JSON (the skill
  shells out to this) — Increment 2 gate "dry-run emits a sane wave plan".
- gate_output(): hybrid D2 gate, failure-isolating and non-raising. Structural
  miss -> ok=False (fail just that branch); semantic_warning surfaced but never
  flips the gate — proves the "semantic-warn non-blocking" gate.

Counts: skill 38->39 pulled forward to here (plugin.json + CLAUDE.md) so
validate-counts stays green now that the skill exists; command count + docs
catalog remain Increment 3/5.

Tests: +6 (39 total in the file). validate-counts green; new SKILL.md
frontmatter valid, no duplicate trigger phrases; markdownlint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(orchestrate): /craft:orchestrate:workflow thin command (Increment 3)

Increment 3 — the thin entry point that owns args and delegates to the
workflow-engine skill (mirrors orchestrate:drive).

commands/orchestrate/workflow.md:
- args: workflow file, --dry-run/-n, --resume <run-id>, --refine (FR6 parity).
- 3-mode "which orchestration?" table (orchestrate / drive / workflow).
- Step-ordered execution: resolve def -> compile (delegate, never eyeball) ->
  dry-run box -> execute via skill -> resume replay -> authoritative verify gate.
- command-audit clean (0 errors/warnings); markdownlint clean.

scripts/workflow_parse.py:
- dry_run_actions(plan): one line per wave; statically-unknown fan-out width
  shown symbolically (xN), never a fabricated number (D1/FR5).
- main(): -n/--dry-run flag prints the plan + run-wide ceiling, no JSON.

.gitignore += .craft/workflow-runs/ (per-run agent outputs, manifest, semaphore).

Counts: command 109->110 (plugin.json + CLAUDE.md); validate-counts green
(110 cmds / 39 skills / 8 agents).

Tests: +3 dry-run (42 total in file).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(orchestrate): D7 known-shape router suggestion (Increment 4)

Increment 4 — task-analyzer detects the decompose→cover→verify→synthesize
shape and SUGGESTS /craft:orchestrate:workflow. Never silently switches
(routing-false-positive guard is the accepted D7 residual risk).

scripts/workflow_parse.py:
- detects_workflow_shape(text): conservative heuristic — fires only when ≥3 of
  the four stage categories (decompose/fan-out/verify/synthesize) appear, or the
  explicit decompose…synthesize chain. A lone "verify"/"parallel" stays silent.

skills/orchestration/task-analyzer/SKILL.md:
- New "Coded-Workflow Shape Detection (D7)" section: trigger-word table,
  suggest-don't-switch confirmation prompt, detected/not-detected examples.

Tests: +4 (46 total in file) — full shape fires, parallel+verify+summarize
fires, ordinary feature request silent, lone signal silent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(orchestrate): runnable example + verify-stage + D8 convergence test (Increment 5a)

scripts/workflow_parse.py:
- verify stages (type: verify) now carry their `command` through to the wave
  plan (D8) — a static gate whose exit status is authoritative.

examples/workflow-code-review/WORKFLOW-code-review-sweep.yaml:
- the spec's runnable 5-dim review case (decompose -> cover N -> verify 2/finding
  -> synthesize); parses and dry-runs cleanly.

tests/test_workflow_engine.py (+3, 49 total):
- verify stage compiles to a command-carrying static gate.
- D8 convergence guard: drive-engine AND workflow-engine skills both keep the
  "real command, exit status authoritative, green transcript insufficient"
  semantics — fails if either drifts.
- the example file parses to the canonical 4-wave shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(orchestrate): FR9 docs surface for :workflow + count sync (Increment 5b)

Ship-blocking documentation surface for /craft:orchestrate:workflow, mirroring
the orchestrate:drive set (FR9).

New pages:
- docs/commands/orchestrate-workflow.md — command reference (forms, schema, replay).
- docs/help/orchestrate-workflow.md — when-to-use, dry-run reading, failure table.
- docs/tutorials/TUTORIAL-orchestrate-workflow.md — hands-on (YAML + shape-DSL,
  --dry-run, --resume, verify gate).
- docs/reference/REFCARD-WORKFLOW.md — quick reference (precedent: REFCARD-CHECK).
- docs/cookbook/recipes/run-a-coded-workflow.md — recipe.

Updated surfaces:
- docs/tutorials/orchestrator-modes-compared.md — new "three orchestration modes"
  section (orchestrate / drive / workflow) distinct from the 4 execution modes.
- docs/REFCARD.md, docs/skills-agents.md (workflow-engine row + 38->39),
  commands/smart-help.md (Orchestration topic + Q&A + Quick Ref row), mkdocs.yml
  nav (5 entries), and the three orchestrator guides cross-linked.
- CHANGELOG.md + docs/CHANGELOG.md [Unreleased].

Count sync: docs-staleness-check.sh --fix swept the Tier 2 long-tail
(109->110 commands, 38->39 skills) across ~20 docs. Staleness now GREEN on all
phases (nav, counts, coverage, freshness); validate-counts green; mkdocs builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(orchestrate): expand --refine scope contract to 6 commands (FR6)

Increment 3 added --refine to /craft:orchestrate:workflow per spec FR6
("consistent with other craft commands"). test_refine_flag_scope pins the
exact sanctioned set, so the contract grows 5 -> 6. Update the expected set
and the CLAUDE.md note. Caught only by the full suite (test_plugin_e2e).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(orchestrate): adversarial-review hardening of the workflow engine

Adversarial pre-PR review surfaced real gaps; closed the genuine ones (TDD):

- build_deps(plan) (NEW): derives the stage dependency graph from over/input
  bindings. cascade_invalidate was tested in isolation but had no way to get
  `deps` from a real plan — the D4 cascade was effectively unreachable on the
  resume path. Now wired (skill references build_deps(plan)).
- Parse-time binding validation: a binding to an unknown or FORWARD stage now
  hard-errors at compile time, naming the stage — was silently accepted and
  only caught at runtime.
- flatten() DSL operator: D3's vocabulary lists map/flatMap/flatten; flatten
  was rejected as "unknown operator". Now accepted (requires a [] path).
- loop max_iter: carried into the wave plan (was dropped).

Disclosed as v1 limitations (command doc): no dry-run static-max-vs-ceiling
warning (FR7 runtime semaphore is the core), loop is represented-not-executed,
and shape-DSL agent() requires an options object.

Tests: +6 (55 total in file), all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(orchestrate): remove ORCHESTRATE working artifact before merge

The ORCHESTRATE-*.md plan is a feature-branch working artifact and must not
land on dev (per the plan's Done checklist + global workflow rule). The spec
remains at docs/specs/SPEC-workflow-engine-2026-06-12.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(orchestrate): guard binding resolution + sync counts (PR #148 review B1/B2)

B1 — _resolve_binding now fails closed as a structured WorkflowError when a
binding walks a missing/non-dict field (both the []-flatten and .field paths),
instead of letting a bare KeyError/TypeError abort the whole resolver on
heterogeneous object[] outputs. +2 regression tests (57 total in file).

B2 — count drift: package.json 109→110 commands / 38→39 skills; plugin.json
breakdown 96+13 → 96+14 workflow (the PR's new orchestrate/workflow.md command)
so it sums to the authoritative 110.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test User <test@example.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rker CI flake

- PR #148 (deterministic workflow engine) shipped → dev caadd8d (squash, --admin)
  with B1/B2 review fixes (d1d0588); worktree + remote branch removed.
- Next Action A reset to the now-confirmed dev-red root cause: mermaid/mermaid_mcp
  pytest markers unregistered in pyproject → --strict-markers collection error when
  the env plugin install flakes. Tiny root-cause fix noted (register both markers).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documentation gap analysis follow-up for the shipped workflow engine (PR #148):

- Count drift (110 commands / 39 skills): README.md (6 refs), docs/index.md
  (live landing), docs/cookbook/common/find-the-right-command.md, plus the
  categorical-subtotal headers bump-version.sh doesn't sweep —
  commands/hub.md `ORCHESTRATE (4)→(5)` + `Skills (38→39)`, and
  docs/skills-agents.md `### Orchestration (4)→(5)`.
- Discoverability: added the missing `/craft:orchestrate:workflow` row to the
  hub's orchestrate command list (it existed nowhere in hub.md).
- Flowcharts (the feature shipped with zero, while sibling orchestrate docs
  have them): a mode-selection decision flowchart in docs/help/orchestrate-workflow.md
  and an execution-pipeline flowchart in docs/commands/orchestrate-workflow.md.
  Both pass the mermaid pre-commit validator; no <br/> tags (repo is phasing them out).

validate-counts: PASS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_mermaid_dogfood.py (@pytest.mark.mermaid) and test_mermaid_e2e.py
(@pytest.mark.mermaid_mcp) used markers absent from pyproject's markers list.
With --strict-markers, that's only safe while an env pytest plugin registers
them at install time; when that install flakes, collection hard-errors and the
ENTIRE "Craft CI" run aborts (exit 2) — a CI-wide failure unrelated to any diff
(observed taking down PR #148 and dev's own merge-commit run on 2026-06-13).

Registering both markers makes collection deterministic regardless of plugin
availability. Full suite: 1729 passed, 38 skipped (mermaid_mcp tests skip
cleanly when the MCP server is absent), 1 xfailed.

Co-authored-by: Test User <test@example.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… gaps shipped

- PR #149 merged → dev d21a1e4 (markers registered; --strict-markers flake gone).
- Doc-gap closure 8854470: counts 110/39, hub entry, 2 flowcharts.
- Next Action A resolved; flagged 2 leftover local branches needing manual -D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Minor release. Highlights (full detail in CHANGELOG):
- Added: /craft:orchestrate:workflow — 3rd orchestration mode (deterministic,
  schema-gated, resumable) + workflow-engine skill + stdlib parser.
- Fixed: test-suite source mutation (command-audit + docs-staleness --fix),
  mermaid/mermaid_mcp --strict-markers CI flake, Homebrew cask depends_on macos
  symbol form.
- Counts: 110 commands, 39 skills, 8 agents.

Version stamped across 14 files; [Unreleased] → [2.36.0] in both CHANGELOGs.
pre-release-check 2.36.0: PASSED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Data-Wise Data-Wise merged commit eb7e143 into main Jun 13, 2026
10 checks passed
Data-Wise pushed a commit that referenced this pull request Jun 13, 2026
PR #150 → main eb7e143, release tag v2.36.0, Homebrew tap bumped (110 commands).
Updated milestone, release_date, and Branch Status table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant