fix(safety): rm -rf bypass — long-form & split flags evade CRITICAL classification (#123) by SakethSumanBathini · Pull Request #128 · sreerevanth/AgentWatch

SakethSumanBathini · 2026-06-03T20:01:21Z

Closes #123

Summary

The recursive-delete guard can be evaded by rewording the flags. The most severe case, rm --recursive --force /, is classified SAFE and executes — a complete bypass of the engine's highest-priority rule. Related forms (rm -r -f /, rm -rf --no-preserve-root /) are misclassified as HIGH instead of CRITICAL, which is only fail-safe on the sync path and becomes exploitable the moment an approval callback is registered.

This PR adds a flag-spelling-agnostic normalizer so any recursive and forced rm aimed at a critical path is classified CRITICAL and hard-blocked, regardless of how the flags are written.

Root cause

FS_DELETE_CRITICAL in BUILTIN_RISK_PATTERNS (agentwatch/core/safety.py:53) is:

r"rm\s+-[rf]+\s*(\/|~|\.\.|\$HOME|\$PWD|/home|/etc|/usr|/var|/bin|/sbin|/boot)"

Two structural assumptions make it evadable:

-[rf]+ requires the recursive and force bits to live in one adjacent token. rm -r -f / (split) and rm --recursive --force / (long-form) don't match.
The flag token must be immediately followed by the path (\s* between them). rm -rf --no-preserve-root / puts a flag in between, so the path isn't adjacent.

When the CRITICAL pattern misses, the command falls through to the HIGH pattern rm\s+-[rf]+ (safety.py:89, block_by_default=False) — or, for long-form flags, matches nothing and scores SAFE.

Exact pre-fix behavior (measured, `DEFAULT_POLICY`)

With an approval callback registered (the normal production configuration) and the callback approving:

Command	Classified	Outcome
`rm -rf /`	CRITICAL	hard-blocked, callback never consulted ✅
`rm -r -f /`	HIGH	downgraded to an approval prompt → approved → RUNS ⚠️
`rm -rf --no-preserve-root /`	HIGH	same — approvable → RUNS ⚠️
`rm --recursive --force /`	SAFE	no prompt at all → RUNS 🔴

So the guarantee "a recursive force-delete of / is always blocked" holds only for the literal rm -rf / spelling. The SAFE case is a full silent bypass; the HIGH cases are blocked only because the sync path fails safe when it can't prompt — once a real approver exists, they become approvable.

(Note: rm -fr / and rm -rf /etc are not bypasses — [rf]+ matches fr, and /etc is in the path list. They already classified CRITICAL and still do. They're included in the test matrix only to prevent regressions.)

The fix

A module-level helper in safety.py, rm_targets_critical_path(text) -> bool, tokenizes the command and independently detects:

recursive intent — -r, -R, --recursive, or r/R inside any -XYZ cluster
force intent — -f, --force, or f inside a cluster
a critical target — /, ~, .., $HOME, $PWD, /home, /etc, /usr, /var, /bin, /sbin, /boot (anchored, via _RM_CRITICAL_PATH_RE)

--no-preserve-root and -- are skipped so they can't mask the path. It returns True only when recursion and force and a critical target are all present, so benign recursive deletes (rm -rf ./node_modules, rm -r ./dist) are unaffected.

It's wired into the two decision points that matter:

RiskScorer.score() (safety.py:330) — after the pattern loop, if the helper fires it forces matched_level = CRITICAL and appends the FS_DELETE_CRITICAL reason/policy. This fixes classification everywhere score() is consumed.
_evaluate_safety() block loop (safety.py:490) — after the block_by_default scan, if not block_immediate and rm_targets_critical_path(full_text): block_immediate = True. This guarantees a hard block independent of the DSL/approval flow, so it can't be downgraded to an approvable prompt.

Scope note — `risk.py` also fixed (please review)

The issue targets safety.py, but agentwatch/core/risk.py has a second, independent dangerous-command scorer (_DANGEROUS_CMD, risk.py:15) with an even narrower pattern:

(re.compile(r"\brm\s+-rf?\s+/(?:\s|$)"), 95)

This matches only rm -rf / / rm -r / and misses -fr, split flags, long-form, and /etc. Since score_event() is a standalone scoring path used elsewhere, I applied the same normalizer there (risk.py, in _command_danger) so both scorers agree. This adds an import of rm_targets_critical_path from safety.py; there's no circular import because safety.py does not import risk.py. I'm flagging this explicitly rather than making a silent out-of-scope change — happy to pull it into a separate PR if you'd prefer to keep this one limited to safety.py.

Tests

New SAF-012 section in tests/test_safety.py (23 cases):

13 parametrized bypass/variant forms asserted CRITICAL + FS_DELETE_CRITICAL (covers split flags, long-form, interleaved --no-preserve-root, ~, $HOME, and the already-covered -fr//etc as regression guards).
6 benign paths (./node_modules, build, config.tmp, ./dist, a plain file, ls -la /) asserted not CRITICAL — guards against over-blocking.
3 async cases (check_tool_call_sync) asserting the previously-bypassable forms are now blocked under the default policy.
1 asserting risk.py's score_event scores the variants ≥ 90.

Verification

Full suite 323 passed, 2 skipped on Linux. ruff check clean. mypy reports no new errors (the two pre-existing SafetyCheckData() call-arg notes at the early-return are untouched).
End-to-end check with an approving callback confirms all four variants above now resolve to BLOCKED with the callback never consulted.

One behavioral note (transparency)

echo rm -rf / is classified CRITICAL. This is unchanged from the original regex (which also matched that substring) — neither the old code nor this helper parses shell quoting/word-boundaries. For a safety tool, a false positive on a harmless echo is the safer error than a false negative on a real command, so I kept that bias rather than adding quote-aware parsing (which would be a larger, separate change). Happy to revisit if you'd prefer stricter tokenization.

Summary by CodeRabbit

New Features
- Stronger detection for recursive+force deletion commands targeting critical filesystem locations; such actions are now escalated to Critical risk and forcibly blocked by default.
Behavior
- Risk scoring now overrides other indicators to ensure dangerous deletion attempts are treated as high-severity and recorded with a standardized critical-deletion reason.
Tests
- Added comprehensive tests covering flag-order/spelling variants and -- option-termination to verify scoring and blocking.

The FS_DELETE_CRITICAL pattern only matched a single adjacent -[rf]+ flag token immediately followed by a critical path, so destructive variants bypassed CRITICAL classification: rm -rf --no-preserve-root / (flag between -rf and path) rm -r -f / (recursion and force split) rm --recursive --force / (long-form flags) rm -fr / (fell through to HIGH, not auto-blocked) Add a flag-spelling-agnostic normalizer (rm_targets_critical_path) that tokenizes the command and detects recursive intent, force intent, and a critical target independently. Wire it into RiskScorer.score() and the block_by_default path in _evaluate_safety so all variants classify as CRITICAL and are blocked by default. The same narrow-regex gap existed in risk.py's separate _DANGEROUS_CMD scorer; apply the normalizer there too. Add 23 regression tests (SAF-012) covering bypass variants, benign paths, default-policy blocking, and the risk.py scorer. Closes sreerevanth#123

ecc-tools · 2026-06-03T20:01:32Z

Analyzing 200 commits...

coderabbitai · 2026-06-03T20:01:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: df48cbd5-6574-4ba4-8563-527ae1466671

📥 Commits

Reviewing files that changed from the base of the PR and between 9083297 and 2fedde7.

📒 Files selected for processing (1)

agentwatch/core/risk.py

🚧 Files skipped from review as they are similar to previous changes (1)

agentwatch/core/risk.py

📝 Walkthrough

Walkthrough

This PR closes a critical security bypass where recursive-force rm commands targeting system paths were not blocked by default when flags were non-adjacent, spelled differently, or in long form. A new rm_targets_critical_path detector is added to normalize command flags and detect recursive-force intent across multiple spellings, then integrated into risk scoring and runtime enforcement to classify and block these variants as CRITICAL by default. Comprehensive tests validate all bypass forms are now detected and blocked.

Changes

Recursive-force rm on critical paths detection and enforcement

Layer / File(s)	Summary
rm critical-path detection helper `agentwatch/core/safety.py`	`rm_targets_critical_path(text)` tokenizes input, detects `rm` with recursive intent (`-r`, `--recursive`) and force intent (`-f`, `--force`, `--no-preserve-root`) across flag forms and flag ordering, respects `--` option termination, and confirms the target argument matches a compiled regex for critical paths.
Risk scoring and command-danger integration `agentwatch/core/risk.py`, `agentwatch/core/safety.py`	`RiskScorer.score` applies `rm_targets_critical_path` as a post-pattern override to promote matched commands to `CRITICAL` and append `FS_DELETE_CRITICAL` with deduping. `_command_danger` imports and applies the same detection to raise danger scores to at least 95 and record `rm_recursive_force_critical_path`.
Default safety enforcement override `agentwatch/core/safety.py`	`SafetyEngine._evaluate_safety` forces `block_immediate = True` when `rm_targets_critical_path` matches and blocking hasn't already occurred.
Comprehensive test suite for bypass variants `tests/test_safety.py`	New SAF-012 tests validate `RiskScorer` classifies many `rm -rf` bypass forms (flag reordering, long-form flags, `--no-preserve-root`) as `CRITICAL` with expected policy match, that non-critical targets are not escalated, that previously-bypassable variants are blocked by default, and that risk scores for bypass variants meet high danger thresholds; also verifies `--` termination behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

sreerevanth/AgentWatch#69: Both PRs modify agentwatch/core/safety.py and the SafetyEngine's risk/blocking flow—this PR adds recursive-force rm critical-path escalation that forces block_immediate, while the retrieved PR changes the text SafetyEngine matches (raw_command/tool_name only), which determines whether the new rm detection/escalation triggers.

Suggested labels

level: intermediate, level2

Poem

🐰 I munched through flags that slipped and slid,
Found hidden gaps where danger hid;
I hopped and stitched each sneaky seam,
Now root's safe in every dream—
Burrow bright and tidy, code well-rid.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main fix: addressing rm -rf bypass vulnerabilities caused by long-form and split flags evading CRITICAL classification, directly matching the PR's core objective of closing issue `#123`.
Linked Issues check	✅ Passed	The PR implements all key coding requirements from `#123`: a flag-spelling-agnostic normalizer detecting recursive+force intent across all flag forms, integration into RiskScorer and SafetyEngine scoring paths, and comprehensive test coverage for bypass variants and --end-of-options behavior.
Out of Scope Changes check	✅ Passed	All code changes remain focused on addressing `#123`: the rm_targets_critical_path helper and its integration into risk.py and safety.py scoring paths, plus test coverage for bypass detection and --end-of-options behavior.
Docstring Coverage	✅ Passed	Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ecc-tools · 2026-06-03T20:02:15Z

Analysis Complete

Generated ECC bundle from 1 commits | Confidence: 50%

View Pull Request #129

Repository Profile

Attribute	Value
Language	Python
Framework	Not detected
Commit Convention	conventional
Test Directory	`separate`

Changed Files (3)

Metric	Value
Files changed	3
Additions	178
Deletions	1

Top hotspots

Path	Status	+/-
`tests/test_safety.py`	modified	+87 / -0
`agentwatch/core/safety.py`	modified	+81 / -0
`agentwatch/core/risk.py`	modified	+10 / -1

Top directories

Directory	Files	Total changes
`agentwatch/core`	2	92
`tests`	1	87

Analysis Depth Readiness (commit-history, 7%)

ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.

Area	Status	Evidence / Next Step
Commit history	Partial	`1 commits sampled`
CI/CD signals	Missing	Add workflow files or CI troubleshooting evidence so ECC Tools can reason about pipeline setup.
Security evidence	Missing	Add AgentShield, audit, SARIF, SBOM, or security review evidence so recommendations can cover security posture.
Harness configuration	Missing	Add Claude, Codex, OpenCode, Zed, dmux, MCP, plugin, or cross-harness config evidence for harness-agnostic recommendations.
Reference/eval evidence	Missing	Add fixtures, golden traces, reference sets, or evaluator benchmarks so deeper recommendations have regression evidence.
AI routing and cost controls	Missing	Add model-routing, budget, usage, or cost-control files before relying on AI-heavy automation recommendations.
Team handoff and project tracking	Missing	Add roadmap, runbook, project, Linear, or follow-up tracking docs so generated work can land in a team queue.

Reference Set Readiness (0/7, 0%)

Area	Status	Evidence / Next Step
Deep analyzer corpus	Missing	Add analyzer fixture, golden, benchmark, or reference-set files that can catch analyzer regressions.
RAG/evaluator comparison	Missing	Add retrieval or evaluator reference-set comparison fixtures with expected ranking behavior.
PR salvage/review corpus	Missing	Add stale-PR, review-thread, reopen-flow, or salvage reference cases for queue cleanup automation.
Discussion triage corpus	Missing	Add public discussion triage fixtures, golden cases, or reference sets for informational, answered, and no-response classifications.
Harness compatibility	Missing	Add cross-harness, adapter-compliance, or harness-audit evidence for Claude, Codex, OpenCode, Zed, dmux, and agent surfaces.
Security evidence	Missing	Attach security evidence such as SBOMs, SARIF, audit reports, or AgentShield evidence packs.
CI failure-mode evidence	Missing	Add captured CI failure logs, dry-run fixtures, or troubleshooting docs for common workflow failure modes.

Generated Instincts (15)

Domain	Count
git	4
code-style	9
testing	2

After merging, import with:

/instinct-import .claude/homunculus/instincts/inherited/AgentWatch-instincts.yaml

Files

.claude/ecc-tools.json
.claude/skills/AgentWatch/SKILL.md
.agents/skills/AgentWatch/SKILL.md
.agents/skills/AgentWatch/agents/openai.yaml
.claude/identity.json
.codex/config.toml
.codex/AGENTS.md
.codex/agents/explorer.toml
.codex/agents/reviewer.toml
.codex/agents/docs-researcher.toml
.claude/homunculus/instincts/inherited/AgentWatch-instincts.yaml

_{ECC Tools | Everything Claude Code}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/core/safety.py`:
- Around line 252-265: The parser loop over parts[start + 1:] currently treats
"--" by skipping that token but continues parsing subsequent operands as flags;
change it to mark the end of option parsing so tokens after "--" are never
treated as flags. Concretely, introduce a boolean (e.g., end_of_options)
initialized False; when arg == "--" set end_of_options = True and continue; wrap
the flag-parsing branches (the short-flag regex branch that checks "r"/"R"/"f"
and any long-flag checks) so they only run when not end_of_options; leave the
critical-path match using _RM_CRITICAL_PATH_RE and normal operand handling
active for args after the terminator. Ensure you update references to
has_recursive and has_force accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 204fa237-5785-405b-bd76-8281822278f6

📥 Commits

Reviewing files that changed from the base of the PR and between 1dcdbb8 and 6fcbfe3.

📒 Files selected for processing (3)

agentwatch/core/risk.py
agentwatch/core/safety.py
tests/test_safety.py

Addresses review feedback: the rm parser treated tokens after the `--` option terminator as flags, so commands like `rm -r -- --force /etc` (where --force is a filename) were wrongly escalated to recursive+force CRITICAL. Introduce an end_of_options flag: once `--` is seen, later tokens are only matched against the critical-path regex, never parsed as flags. Genuine targets after `--` (e.g. `rm -rf -- /`) still classify CRITICAL since the path matcher continues to run. Add 6 tests (SAF-012): 3 asserting flag-like operands after `--` are not CRITICAL, 3 asserting real critical paths after `--` stay CRITICAL.

SakethSumanBathini · 2026-06-03T20:18:11Z

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/core/safety.py`:
- Around line 252-265: The parser loop over parts[start + 1:] currently treats
"--" by skipping that token but continues parsing subsequent operands as flags;
change it to mark the end of option parsing so tokens after "--" are never
treated as flags. Concretely, introduce a boolean (e.g., end_of_options)
initialized False; when arg == "--" set end_of_options = True and continue; wrap
the flag-parsing branches (the short-flag regex branch that checks "r"/"R"/"f"
and any long-flag checks) so they only run when not end_of_options; leave the
critical-path match using _RM_CRITICAL_PATH_RE and normal operand handling
active for args after the terminator. Ensure you update references to
has_recursive and has_force accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)

Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 204fa237-5785-405b-bd76-8281822278f6

📥 Commits
Reviewing files that changed from the base of the PR and between 1dcdbb8 and 6fcbfe3.

📒 Files selected for processing (3)

@coderabbitai good catch — you're right. Fixed in 9083297.
The parser now tracks an end_of_options flag: once -- is seen, subsequent tokens are only matched against the critical-path regex and never parsed as flags. So rm -r -- --force /etc (where --force is a filename) no longer escalates to recursive+force, while genuine targets after the terminator like rm -rf -- / still classify CRITICAL because the path matcher keeps running.
Added 6 tests for it: 3 asserting flag-like operands after -- are not CRITICAL, and 3 asserting real critical paths after -- stay CRITICAL.

coderabbitai · 2026-06-03T20:18:28Z

Tip

For best results, initiate chat on the files or code changes.

@SakethSumanBathini: Great work on the fix! The end_of_options approach is the correct POSIX-compliant way to handle --, and the dual test coverage (flag-like operands after -- not escalating + real critical targets after -- still classifying as CRITICAL) looks thorough. Thanks for addressing this promptly.

(∧ ﹏ ∧) 🐇✨

SakethSumanBathini · 2026-06-03T20:41:38Z

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.

Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

sreerevanth · 2026-06-04T10:15:12Z

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.

Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

yea i can merge it if the ci is green 👀

SakethSumanBathini · 2026-06-04T10:54:18Z

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.
Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

yea i can merge it if the ci is green 👀

Thanks! One CI check is failing (Python Lint & Type Check) — looking into it now and will push the fix shortly. Appreciate the quick response 🙏

SakethSumanBathini · 2026-06-04T11:00:14Z

@sreerevanth the ruff fix is pushed — could you approve the CI workflow to run? (There's a "1 workflow awaiting approval" banner on the PR.) Once you approve it and the checks pass, it should be ready to merge. Thanks! 🙏

SakethSumanBathini · 2026-06-04T15:39:07Z

Hi @sreerevanth — really appreciate you approving the
workflow and all checks are green now! 🎉

One small thing — I noticed the PR was labeled ADVENTURER,
but issue #123 has the VETERAN label on it. Could you
update it to VETERAN before merging? Happy to wait —
just want to make sure it scores correctly under ELUSOC.

Thanks so much! 🙏

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread agentwatch/core/safety.py Outdated

style(risk): sort imports to satisfy ruff I001

2fedde7

sreerevanth added ADVENTURER ELUSOC labels Jun 4, 2026

sreerevanth added VETERAN and removed ADVENTURER labels Jun 4, 2026

sreerevanth merged commit bfa12d7 into sreerevanth:main Jun 4, 2026
8 checks passed

Conversation

SakethSumanBathini commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Exact pre-fix behavior (measured, DEFAULT_POLICY)

The fix

Scope note — risk.py also fixed (please review)

Tests

Verification

One behavioral note (transparency)

Summary by CodeRabbit

Uh oh!

ecc-tools Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

ecc-tools Bot commented Jun 3, 2026

Analysis Complete

View Pull Request #129

Files

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SakethSumanBathini commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026

Uh oh!

SakethSumanBathini commented Jun 3, 2026

Uh oh!

sreerevanth commented Jun 4, 2026

Uh oh!

SakethSumanBathini commented Jun 4, 2026

Uh oh!

SakethSumanBathini commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SakethSumanBathini commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SakethSumanBathini commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Exact pre-fix behavior (measured, `DEFAULT_POLICY`)

Scope note — `risk.py` also fixed (please review)

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

SakethSumanBathini commented Jun 4, 2026 •

edited

Loading