Project page: https://fu-max-boop.github.io/statebind-guard/ Launch note: Visible context is not executable state Launch package: copy-paste proof, posts, and maintainer pitch Adoption receipt: separate-repository GitHub Action proof Adoption audit: pre-adoption scanner for third-party repositories Deployed corpus: sanitized release, CI, packaging, and adoption handoffs Scout campaign: 8 public AI/tooling repos ranked through GitHub API scout Target review: human gate for first adoption outreach targets Context evidence: current public issues behind feedback-only drafts Feedback drafts: manual-review feedback requests Maintainer packet: Pydantic AI feedback packet Roadmap / feedback: roadmap, adoption feedback
StateBind Guard is a small benchmark and checker for a simple failure mode in coding-agent handoffs:
A handoff can contain the right identifier and still fail if it does not preserve the binding from active target to semantic role to executable handle.
Try the claim before installing any hooks:
python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind proofExpected shape:
bad_visible_unbound: FAIL
good_role_bound: PASS
For long-running coding agents, the basic memory unit should not only be a chunk or a summary. It should preserve executable bindings such as:
active PR -> comparison-base role -> exact commit SHA
active task -> failing-test role -> exact pytest selector
active patch -> current-file role -> exact file path
This repository packages five things:
- Benchmark artifact: seed, natural-handoff, failure-corpus, and deployed-derived snippets with baselines, result cards, and tests.
- Practical handoff tool: a Codex-compatible
statebind-handoffskill plus a lightweight local script for generating and checking executable handoffs. - CI-ready validator: a dependency-free
statebindCLI that emits structured findings for versionedstatebind.jsoncontracts. - Policy-as-code gate: a small
.statebind-policy.jsonfile for required roles, confidence floors, and risk requirements. - GitHub Action: a composite action that validates handoff contracts and writes JSON/SARIF/Markdown reports for CI and code scanning.
External receipt: a separate public repository, statebind-guard-adoption-example, pins the released action and has a passing workflow run with JSON, SARIF, Markdown, and HTML artifacts: run 26735088397.
Release receipt: each v* tag builds and verifies both a Python wheel and
source distribution, then attaches those assets to the GitHub release. The
local gate is:
make dist-checkCoding agents increasingly resume work from summaries, retrieval contexts, memory files, and tool traces. These handoffs often preserve narrative context but lose operational state: the next agent may know what happened but not exactly which file, command, commit, PR, issue, or artifact to act on.
StateBindBench turns this into an interaction-aware evaluation target for human-centered coding agents:
Can the next agent preserve the executable binding needed to act safely?
For a quick technical screen, this repository should answer three questions:
- Can the artifact be run locally?
- Does it expose a concrete agent failure mode?
- Does it state the boundary of the claim?
Run:
python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind proof
bash scripts/run_smoke_test.sh
make benchmarkThen inspect:
- launch note
- launch package
- quick demo
- adoption examples
- deployed corpus result card
- GitHub scout campaign result card
- adoption target review
- adoption context evidence
- adoption feedback request drafts
- Pydantic AI maintainer feedback packet
- roadmap
- adoption feedback
- failure cases
- limitations
Add StateBind Guard to any repository in about 30 seconds:
python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind audit --repo . --markdown statebind-adoption-audit.md
statebind audit --repo . --issue-template statebind-maintainer-note.md
statebind audit --repo-url https://github.com/owner/repo --issue-template statebind-maintainer-note.md
statebind scout --repo-url https://github.com/owner/repo --issue-dir statebind-notes --markdown statebind-scout.md --result-card statebind-scout-card.md
statebind scout --github-repo owner/repo --issue-context --issue-context-card statebind-issue-context.md --feedback-packet statebind-feedback-packet.md --issue-dir statebind-notes --markdown statebind-scout.md --result-card statebind-scout-card.md
statebind proof
statebind capture-github-run --run-url https://github.com/owner/repo/actions/runs/123 --next-command "make test" --out statebind-ci.json --handoff HANDOFF.ci.md
statebind capture-worktree --next-command "make test" --active-file src/app.py --out statebind-local.json --handoff HANDOFF.local.md
statebind init --goal "keep coding-agent handoffs executable" --next-command "make test" --policy-out .statebind-policy.json --pre-commit-config .pre-commit-config.yaml
statebind install-hook --policy .statebind-policy.json
statebind doctor
git add HANDOFF.md statebind.json .statebind-policy.json .pre-commit-config.yaml .github/workflows/statebind-guard.ymlThis creates a ready-to-run handoff contract, policy file, standard pre-commit
config, and GitHub Actions workflow that publishes JSON/SARIF validation reports
on future pushes and pull requests.
The optional local hook blocks commits when statebind.json loses its
executable binding contract. statebind doctor audits whether the repo has the
contract, human handoff, GitHub Action, pre-commit config, local hook, and
validation gate wired correctly. statebind policy creates a team-editable
policy file for required roles and confidence thresholds, with presets for
bug fixes, CI failures, releases, migrations, and benchmark runs.
For stricter bug-fix handoffs, run
statebind policy --preset bugfix --out .statebind-policy.json.
The GitHub Action also exposes passed, errors, warnings, and exit_code
outputs, so downstream workflow steps can route failed handoffs without parsing
artifacts.
Use it with the standard pre-commit framework:
repos:
- repo: https://github.com/FU-max-boop/statebind-guard
rev: v0.1.38
hooks:
- id: statebind-guardRun the smoke demo:
statebind proof
bash scripts/run_smoke_test.shRun the local quality checks:
make test
make benchmark
make dist-check
make public-checkSee quick demo for the benchmark result summary and
pre-commit usage and
GitHub Action usage for local and CI gate examples.
See policy usage for team-specific gates and scenario
presets.
See adoption examples for a separate public
repository that consumes the released GitHub Action and verifies action outputs
in CI.
Run statebind audit --repo . first when evaluating a repository that has not
adopted StateBind yet; it reports the current adoption level, handoff-like files,
the smallest inferred test command, and the smallest copy-paste adoption PR.
Use --repo-url https://github.com/owner/repo to clone a public repository into
a temporary checkout and generate the same audit without manually cloning it;
add --clone-timeout 60 for slow hosts.
Use statebind scout when comparing several candidate repositories; it ranks
targets by adoption priority and can write one maintainer-safe draft note per
useful target. Use --github-repo owner/repo or --github-list when large
GitHub repositories are too expensive to clone; this API mode scans the Git tree
and a small set of configuration files. Set GH_TOKEN or GITHUB_TOKEN before
larger campaigns to avoid unauthenticated GitHub API rate limits. Add
--result-card when you want a compact evidence card for a review thread,
launch note, or maintainer discussion. Add --issue-context with
--issue-context-card to search public GitHub issues for handoff/resume context
before writing a maintainer-facing feedback request. Use --issue-context-term
to override the default search terms when a target has a more specific durable
execution surface. Add --feedback-packet when you want a generated,
human-review-gated maintainer feedback packet from the same scout evidence.
Use statebind capture-github-run inside a failing GitHub Actions job, or with
--run-url, to bind the run URL, workflow/job, commit SHA, ref, conclusion,
and exact next command into a runtime handoff artifact for the next debugging
actor.
Use statebind capture-worktree before handing off local coding-agent work to
bind the current branch, head SHA, staged/modified/untracked files, active file,
and exact next command without leaking absolute local paths.
Use it directly in a GitHub workflow:
- uses: FU-max-boop/statebind-guard@v0.1.38
with:
handoff: HANDOFF.md
statebind-json: statebind.json
policy: .statebind-policy.json
fail-on: warningInstall the CLI locally:
python -m pip install -e .
statebind demoGenerate and validate a machine-readable handoff:
statebind extract \
--repo . \
--transcript examples/codex-handoff-demo/transcript.md \
--repo-label . \
--transcript-label examples/codex-handoff-demo/transcript.md \
--out HANDOFF.md \
--json statebind.json
statebind validate statebind.json \
--repo . \
--policy .statebind-policy.json \
--fail-on error \
--report statebind-validation.json \
--sarif statebind-validation.sarif \
--summary statebind-summary.md \
--html-report statebind-report.html \
--github-annotations
statebind doctor --repo .Install the local Codex skill:
bash scripts/install_codex_skill.shThen ask Codex:
Use statebind-handoff to create a HANDOFF.md for this coding task.
action.yml # reusable GitHub composite action
statebind_handoff/
statebind_handoff.py # dependency-free handoff helper
integrations/codex-skill/
statebind-handoff/ # installable Codex skill
examples/
visible-id-unbound/ # minimal mechanism demo
codex-handoff-demo/ # toy transcript for handoff generation
docs/
case_studies/
industrial_adoption.md
failure_cases.md
github_action_usage.md
handoff_contract.md
adoption_audit.md
adoption_feedback.md
launch_package.md
launch_note.md
limitations.md
policy_usage.md
pre_commit_usage.md
quality_gates.md
quick_demo.md
roadmap.md
result_cards/
research_brief.md
schemas/
statebind.schema.json # versioned machine contract
statebind-policy.schema.json # versioned policy contract
data/
statebind_guard_seed_benchmark.json
statebind_guard_natural_handoff_benchmark.json
statebind_guard_failure_corpus.json
statebind_guard_github_scout_campaign_2026_05_31.json
statebind_guard_adoption_target_review_2026_05_31.json
Generate a draft handoff from a transcript and current repo state:
python statebind_handoff/statebind_handoff.py extract \
--repo . \
--transcript examples/codex-handoff-demo/transcript.md \
--repo-label . \
--transcript-label examples/codex-handoff-demo/transcript.md \
--out HANDOFF.md \
--json statebind.jsonCheck a handoff:
python statebind_handoff/statebind_handoff.py check HANDOFF.mdValidate the machine-readable contract:
python statebind_handoff/statebind_handoff.py validate statebind.json --repo . --jsonPrint the JSON schema:
python statebind_handoff/statebind_handoff.py schemaPrint the core visible-but-unbound demo:
python statebind_handoff/statebind_handoff.py demoRun the benchmarks:
make benchmarkThe included seed and natural-handoff benchmarks evaluate whether a checker can
reject handoffs that mention the right file, command, PR, test, artifact, or SHA
but fail to bind it to the active role. On the included real-shaped snippets,
statebind_guard beats both visibility and keyword-role baselines and reduces
unsafe accepts.
The larger failure corpus adds coverage across wrong-file, wrong-test, wrong-commit, wrong-PR, stale-artifact, config/environment, dataset-version, run-ID, branch-name, risky-command, and multi-binding failures. See failure corpus result card and the schema/report upgrade case study. The GitHub scout campaign card shows a real authenticated API scan over eight public AI/tooling repositories and preserves the human-review boundary before any outreach.
StateBind is an executable handoff contract:
active target -> semantic role -> executable handle
It is not a claim that retrieval always fails. Retrieval can work when the active record and executable handle are cleanly exposed. The claim is narrower: relevance and visibility are insufficient metrics for reliable agent handoff because they do not guarantee that the next agent has preserved the action-relevant binding.
The artifact is checked against five practical gates:
- Research gate: the thesis is legible and does not overclaim.
- Tool gate: a user can generate and check a handoff in minutes.
- Evidence gate: each binding has role, handle, evidence, confidence, and risk.
- Public-release gate: no obvious local paths, secrets, or accidental generated files.
- Outreach gate: the README, brief, examples, and slides make the work discussable.
- Adoption gate: a separate public repository consumes the released action and publishes validation artifacts.
See docs/quality_gates.md for the full checklist.
If StateBind Guard helps your work, cite CITATION.cff. If you try it in a real repository, open an adoption report or a sanitized failure-case issue. Real friction and negative results are useful; the goal is to collect evidence about where executable handoff contracts help and where they are too strict, too weak, or missing integrations.
This repository is a public tool/benchmark release. It intentionally excludes paper PDFs, review packages, and large raw supplements.
- Fully natural deployed-agent handoff corpus.
- Learned StateBind construction from messy traces.
- Binding-aware memory interfaces for coding agents.
- Integration with Codex, Claude Code, OpenHands, and repository RAG systems.
- Safety evaluation for wrong-object coding-agent actions.