Skip to content

FU-max-boop/statebind-guard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StateBind Guard

CI Release License: MIT

Project page: https://fu-max-boop.github.io/statebind-guard/ Launch note: Visible context is not executable state Launch package: copy-paste proof, posts, and maintainer pitch Adoption receipt: separate-repository GitHub Action proof Adoption audit: pre-adoption scanner for third-party repositories Deployed corpus: sanitized release, CI, packaging, and adoption handoffs Scout campaign: 8 public AI/tooling repos ranked through GitHub API scout Target review: human gate for first adoption outreach targets Context evidence: current public issues behind feedback-only drafts Feedback drafts: manual-review feedback requests Maintainer packet: Pydantic AI feedback packet Roadmap / feedback: roadmap, adoption feedback

StateBind Guard is a small benchmark and checker for a simple failure mode in coding-agent handoffs:

StateBind Guard demo

A handoff can contain the right identifier and still fail if it does not preserve the binding from active target to semantic role to executable handle.

Try the claim before installing any hooks:

python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind proof

Expected shape:

bad_visible_unbound: FAIL
good_role_bound: PASS

For long-running coding agents, the basic memory unit should not only be a chunk or a summary. It should preserve executable bindings such as:

active PR -> comparison-base role -> exact commit SHA
active task -> failing-test role -> exact pytest selector
active patch -> current-file role -> exact file path

This repository packages five things:

  1. Benchmark artifact: seed, natural-handoff, failure-corpus, and deployed-derived snippets with baselines, result cards, and tests.
  2. Practical handoff tool: a Codex-compatible statebind-handoff skill plus a lightweight local script for generating and checking executable handoffs.
  3. CI-ready validator: a dependency-free statebind CLI that emits structured findings for versioned statebind.json contracts.
  4. Policy-as-code gate: a small .statebind-policy.json file for required roles, confidence floors, and risk requirements.
  5. GitHub Action: a composite action that validates handoff contracts and writes JSON/SARIF/Markdown reports for CI and code scanning.

External receipt: a separate public repository, statebind-guard-adoption-example, pins the released action and has a passing workflow run with JSON, SARIF, Markdown, and HTML artifacts: run 26735088397.

Release receipt: each v* tag builds and verifies both a Python wheel and source distribution, then attaches those assets to the GitHub release. The local gate is:

make dist-check

Why This Matters

Coding agents increasingly resume work from summaries, retrieval contexts, memory files, and tool traces. These handoffs often preserve narrative context but lose operational state: the next agent may know what happened but not exactly which file, command, commit, PR, issue, or artifact to act on.

StateBindBench turns this into an interaction-aware evaluation target for human-centered coding agents:

Can the next agent preserve the executable binding needed to act safely?

5-Minute Proof Gate

For a quick technical screen, this repository should answer three questions:

  1. Can the artifact be run locally?
  2. Does it expose a concrete agent failure mode?
  3. Does it state the boundary of the claim?

Run:

python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind proof
bash scripts/run_smoke_test.sh
make benchmark

Then inspect:

Quick Start

Add StateBind Guard to any repository in about 30 seconds:

python -m pip install "git+https://github.com/FU-max-boop/statebind-guard.git@v0.1.38"
statebind audit --repo . --markdown statebind-adoption-audit.md
statebind audit --repo . --issue-template statebind-maintainer-note.md
statebind audit --repo-url https://github.com/owner/repo --issue-template statebind-maintainer-note.md
statebind scout --repo-url https://github.com/owner/repo --issue-dir statebind-notes --markdown statebind-scout.md --result-card statebind-scout-card.md
statebind scout --github-repo owner/repo --issue-context --issue-context-card statebind-issue-context.md --feedback-packet statebind-feedback-packet.md --issue-dir statebind-notes --markdown statebind-scout.md --result-card statebind-scout-card.md
statebind proof
statebind capture-github-run --run-url https://github.com/owner/repo/actions/runs/123 --next-command "make test" --out statebind-ci.json --handoff HANDOFF.ci.md
statebind capture-worktree --next-command "make test" --active-file src/app.py --out statebind-local.json --handoff HANDOFF.local.md
statebind init --goal "keep coding-agent handoffs executable" --next-command "make test" --policy-out .statebind-policy.json --pre-commit-config .pre-commit-config.yaml
statebind install-hook --policy .statebind-policy.json
statebind doctor
git add HANDOFF.md statebind.json .statebind-policy.json .pre-commit-config.yaml .github/workflows/statebind-guard.yml

This creates a ready-to-run handoff contract, policy file, standard pre-commit config, and GitHub Actions workflow that publishes JSON/SARIF validation reports on future pushes and pull requests. The optional local hook blocks commits when statebind.json loses its executable binding contract. statebind doctor audits whether the repo has the contract, human handoff, GitHub Action, pre-commit config, local hook, and validation gate wired correctly. statebind policy creates a team-editable policy file for required roles and confidence thresholds, with presets for bug fixes, CI failures, releases, migrations, and benchmark runs. For stricter bug-fix handoffs, run statebind policy --preset bugfix --out .statebind-policy.json. The GitHub Action also exposes passed, errors, warnings, and exit_code outputs, so downstream workflow steps can route failed handoffs without parsing artifacts.

Use it with the standard pre-commit framework:

repos:
  - repo: https://github.com/FU-max-boop/statebind-guard
    rev: v0.1.38
    hooks:
      - id: statebind-guard

Run the smoke demo:

statebind proof
bash scripts/run_smoke_test.sh

Run the local quality checks:

make test
make benchmark
make dist-check
make public-check

See quick demo for the benchmark result summary and pre-commit usage and GitHub Action usage for local and CI gate examples. See policy usage for team-specific gates and scenario presets. See adoption examples for a separate public repository that consumes the released GitHub Action and verifies action outputs in CI. Run statebind audit --repo . first when evaluating a repository that has not adopted StateBind yet; it reports the current adoption level, handoff-like files, the smallest inferred test command, and the smallest copy-paste adoption PR. Use --repo-url https://github.com/owner/repo to clone a public repository into a temporary checkout and generate the same audit without manually cloning it; add --clone-timeout 60 for slow hosts. Use statebind scout when comparing several candidate repositories; it ranks targets by adoption priority and can write one maintainer-safe draft note per useful target. Use --github-repo owner/repo or --github-list when large GitHub repositories are too expensive to clone; this API mode scans the Git tree and a small set of configuration files. Set GH_TOKEN or GITHUB_TOKEN before larger campaigns to avoid unauthenticated GitHub API rate limits. Add --result-card when you want a compact evidence card for a review thread, launch note, or maintainer discussion. Add --issue-context with --issue-context-card to search public GitHub issues for handoff/resume context before writing a maintainer-facing feedback request. Use --issue-context-term to override the default search terms when a target has a more specific durable execution surface. Add --feedback-packet when you want a generated, human-review-gated maintainer feedback packet from the same scout evidence. Use statebind capture-github-run inside a failing GitHub Actions job, or with --run-url, to bind the run URL, workflow/job, commit SHA, ref, conclusion, and exact next command into a runtime handoff artifact for the next debugging actor. Use statebind capture-worktree before handing off local coding-agent work to bind the current branch, head SHA, staged/modified/untracked files, active file, and exact next command without leaking absolute local paths.

Use it directly in a GitHub workflow:

- uses: FU-max-boop/statebind-guard@v0.1.38
  with:
    handoff: HANDOFF.md
    statebind-json: statebind.json
    policy: .statebind-policy.json
    fail-on: warning

Install the CLI locally:

python -m pip install -e .
statebind demo

Generate and validate a machine-readable handoff:

statebind extract \
  --repo . \
  --transcript examples/codex-handoff-demo/transcript.md \
  --repo-label . \
  --transcript-label examples/codex-handoff-demo/transcript.md \
  --out HANDOFF.md \
  --json statebind.json

statebind validate statebind.json \
  --repo . \
  --policy .statebind-policy.json \
  --fail-on error \
  --report statebind-validation.json \
  --sarif statebind-validation.sarif \
  --summary statebind-summary.md \
  --html-report statebind-report.html \
  --github-annotations

statebind doctor --repo .

Install the local Codex skill:

bash scripts/install_codex_skill.sh

Then ask Codex:

Use statebind-handoff to create a HANDOFF.md for this coding task.

Repository Layout

action.yml                              # reusable GitHub composite action

statebind_handoff/
  statebind_handoff.py               # dependency-free handoff helper

integrations/codex-skill/
  statebind-handoff/                 # installable Codex skill

examples/
  visible-id-unbound/                # minimal mechanism demo
  codex-handoff-demo/                # toy transcript for handoff generation

docs/
  case_studies/
  industrial_adoption.md
  failure_cases.md
  github_action_usage.md
  handoff_contract.md
  adoption_audit.md
  adoption_feedback.md
  launch_package.md
  launch_note.md
  limitations.md
  policy_usage.md
  pre_commit_usage.md
  quality_gates.md
  quick_demo.md
  roadmap.md
  result_cards/
  research_brief.md

schemas/
  statebind.schema.json              # versioned machine contract
  statebind-policy.schema.json       # versioned policy contract

data/
  statebind_guard_seed_benchmark.json
  statebind_guard_natural_handoff_benchmark.json
  statebind_guard_failure_corpus.json
  statebind_guard_github_scout_campaign_2026_05_31.json
  statebind_guard_adoption_target_review_2026_05_31.json

Use With Codex

Generate a draft handoff from a transcript and current repo state:

python statebind_handoff/statebind_handoff.py extract \
  --repo . \
  --transcript examples/codex-handoff-demo/transcript.md \
  --repo-label . \
  --transcript-label examples/codex-handoff-demo/transcript.md \
  --out HANDOFF.md \
  --json statebind.json

Check a handoff:

python statebind_handoff/statebind_handoff.py check HANDOFF.md

Validate the machine-readable contract:

python statebind_handoff/statebind_handoff.py validate statebind.json --repo . --json

Print the JSON schema:

python statebind_handoff/statebind_handoff.py schema

Print the core visible-but-unbound demo:

python statebind_handoff/statebind_handoff.py demo

Run the benchmarks:

make benchmark

The included seed and natural-handoff benchmarks evaluate whether a checker can reject handoffs that mention the right file, command, PR, test, artifact, or SHA but fail to bind it to the active role. On the included real-shaped snippets, statebind_guard beats both visibility and keyword-role baselines and reduces unsafe accepts.

The larger failure corpus adds coverage across wrong-file, wrong-test, wrong-commit, wrong-PR, stale-artifact, config/environment, dataset-version, run-ID, branch-name, risky-command, and multi-binding failures. See failure corpus result card and the schema/report upgrade case study. The GitHub scout campaign card shows a real authenticated API scan over eight public AI/tooling repositories and preserves the human-review boundary before any outreach.

What StateBind Is And Is Not

StateBind is an executable handoff contract:

active target -> semantic role -> executable handle

It is not a claim that retrieval always fails. Retrieval can work when the active record and executable handle are cleanly exposed. The claim is narrower: relevance and visibility are insufficient metrics for reliable agent handoff because they do not guarantee that the next agent has preserved the action-relevant binding.

Quality Gates

The artifact is checked against five practical gates:

  1. Research gate: the thesis is legible and does not overclaim.
  2. Tool gate: a user can generate and check a handoff in minutes.
  3. Evidence gate: each binding has role, handle, evidence, confidence, and risk.
  4. Public-release gate: no obvious local paths, secrets, or accidental generated files.
  5. Outreach gate: the README, brief, examples, and slides make the work discussable.
  6. Adoption gate: a separate public repository consumes the released action and publishes validation artifacts.

See docs/quality_gates.md for the full checklist.

Citation And Feedback

If StateBind Guard helps your work, cite CITATION.cff. If you try it in a real repository, open an adoption report or a sanitized failure-case issue. Real friction and negative results are useful; the goal is to collect evidence about where executable handoff contracts help and where they are too strict, too weak, or missing integrations.

Current Status

This repository is a public tool/benchmark release. It intentionally excludes paper PDFs, review packages, and large raw supplements.

Future Directions

  • Fully natural deployed-agent handoff corpus.
  • Learned StateBind construction from messy traces.
  • Binding-aware memory interfaces for coding agents.
  • Integration with Codex, Claude Code, OpenHands, and repository RAG systems.
  • Safety evaluation for wrong-object coding-agent actions.

About

Catch visible-but-unbound coding-agent handoffs: CLI + GitHub Action with proof, policy gates, SARIF, HTML, and benchmark cards.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors