Skip to content

docs: add behavior control positioning with safety evidence#348

Merged
bmdhodl merged 2 commits intomainfrom
feat/safety-narrative-behavior-control
Apr 14, 2026
Merged

docs: add behavior control positioning with safety evidence#348
bmdhodl merged 2 commits intomainfrom
feat/safety-narrative-behavior-control

Conversation

@bmdhodl
Copy link
Copy Markdown
Owner

@bmdhodl bmdhodl commented Apr 14, 2026

Summary

  • Add "Why Static Guards" section to README with three recent safety findings (one peer-reviewed, two from preview/preprint)
  • Position AgentGuard as behavior control (not just cost control)
  • Rule-based guards can't be socially engineered by the models they guard

Data points cited

  1. Mythos Preview (April 2026) - found vulnerabilities in every major OS/browser, triggered government emergency meeting
  2. Nature (2026) - (peer-reviewed) evidence of LLMs disabling oversight, scheming, leaving hidden notes
  3. War games (arXiv 2602.14740) - GPT-5.2, Claude Sonnet 4, Gemini 3 Flash showed spontaneous deception, 0% surrender, nuclear escalation

Test plan

  • All 672 existing tests pass
  • No SDK code changes (docs only)
  • PyPI README regenerated and in sync
  • Claims backed by cited sources

🤖 Generated with Claude Code

AgentGuard leads with cost control today. Three recent data points
(Mythos Preview government emergency, Nature peer-reviewed deception
evidence, arXiv war games nuclear escalation) validate the thesis
that static rule-based guards are the correct architecture for agent
safety. This adds a "Why static guards" section making that case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 14, 2026 13:58
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation section positioning AgentGuard’s static, deterministic guards as “behavior control” and introduces supporting safety-related evidence in the README surfaces (GitHub + PyPI).

Changes:

  • Add a new “Why static guards” section describing behavior-control framing and deterministic guard benefits.
  • Cite three safety-related evidence points (Mythos Preview, a Nature paper, and an arXiv war-games preprint) in both README.md and the generated PyPI README.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
README.md Adds a new “Why static guards” section to reframe the product and list safety evidence.
sdk/PYPI_README.md Updates the generated PyPI README to include the same new section content.

Comment thread sdk/PYPI_README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Address Copilot review on PR #348:
- Add (source) links to Mythos Preview, Nature, and arXiv war games citations
- Remove '(arXiv 2602.14740)' inline ref in favor of explicit link
- Apply to both README.md and sdk/PYPI_README.md
@bmdhodl bmdhodl merged commit fe6c5b6 into main Apr 14, 2026
12 checks passed
@bmdhodl bmdhodl deleted the feat/safety-narrative-behavior-control branch April 14, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants