Skip to content

fix(safety): rm -rf bypass — long-form & split flags evade CRITICAL classification (#123)#128

Merged
sreerevanth merged 3 commits into
sreerevanth:mainfrom
SakethSumanBathini:fix/rm-bypass-normalizer
Jun 4, 2026
Merged

fix(safety): rm -rf bypass — long-form & split flags evade CRITICAL classification (#123)#128
sreerevanth merged 3 commits into
sreerevanth:mainfrom
SakethSumanBathini:fix/rm-bypass-normalizer

Conversation

@SakethSumanBathini
Copy link
Copy Markdown
Contributor

@SakethSumanBathini SakethSumanBathini commented Jun 3, 2026

Closes #123

Summary

The recursive-delete guard can be evaded by rewording the flags. The most severe case, rm --recursive --force /, is classified SAFE and executes — a complete bypass of the engine's highest-priority rule. Related forms (rm -r -f /, rm -rf --no-preserve-root /) are misclassified as HIGH instead of CRITICAL, which is only fail-safe on the sync path and becomes exploitable the moment an approval callback is registered.

This PR adds a flag-spelling-agnostic normalizer so any recursive and forced rm aimed at a critical path is classified CRITICAL and hard-blocked, regardless of how the flags are written.

Root cause

FS_DELETE_CRITICAL in BUILTIN_RISK_PATTERNS (agentwatch/core/safety.py:53) is:

r"rm\s+-[rf]+\s*(\/|~|\.\.|\$HOME|\$PWD|/home|/etc|/usr|/var|/bin|/sbin|/boot)"

Two structural assumptions make it evadable:

  1. -[rf]+ requires the recursive and force bits to live in one adjacent token. rm -r -f / (split) and rm --recursive --force / (long-form) don't match.
  2. The flag token must be immediately followed by the path (\s* between them). rm -rf --no-preserve-root / puts a flag in between, so the path isn't adjacent.

When the CRITICAL pattern misses, the command falls through to the HIGH pattern rm\s+-[rf]+ (safety.py:89, block_by_default=False) — or, for long-form flags, matches nothing and scores SAFE.

Exact pre-fix behavior (measured, DEFAULT_POLICY)

With an approval callback registered (the normal production configuration) and the callback approving:

Command Classified Outcome
rm -rf / CRITICAL hard-blocked, callback never consulted ✅
rm -r -f / HIGH downgraded to an approval prompt → approved → RUNS ⚠️
rm -rf --no-preserve-root / HIGH same — approvable → RUNS ⚠️
rm --recursive --force / SAFE no prompt at all → RUNS 🔴

So the guarantee "a recursive force-delete of / is always blocked" holds only for the literal rm -rf / spelling. The SAFE case is a full silent bypass; the HIGH cases are blocked only because the sync path fails safe when it can't prompt — once a real approver exists, they become approvable.

(Note: rm -fr / and rm -rf /etc are not bypasses — [rf]+ matches fr, and /etc is in the path list. They already classified CRITICAL and still do. They're included in the test matrix only to prevent regressions.)

The fix

A module-level helper in safety.py, rm_targets_critical_path(text) -> bool, tokenizes the command and independently detects:

  • recursive intent-r, -R, --recursive, or r/R inside any -XYZ cluster
  • force intent-f, --force, or f inside a cluster
  • a critical target/, ~, .., $HOME, $PWD, /home, /etc, /usr, /var, /bin, /sbin, /boot (anchored, via _RM_CRITICAL_PATH_RE)

--no-preserve-root and -- are skipped so they can't mask the path. It returns True only when recursion and force and a critical target are all present, so benign recursive deletes (rm -rf ./node_modules, rm -r ./dist) are unaffected.

It's wired into the two decision points that matter:

  1. RiskScorer.score() (safety.py:330) — after the pattern loop, if the helper fires it forces matched_level = CRITICAL and appends the FS_DELETE_CRITICAL reason/policy. This fixes classification everywhere score() is consumed.
  2. _evaluate_safety() block loop (safety.py:490) — after the block_by_default scan, if not block_immediate and rm_targets_critical_path(full_text): block_immediate = True. This guarantees a hard block independent of the DSL/approval flow, so it can't be downgraded to an approvable prompt.

Scope note — risk.py also fixed (please review)

The issue targets safety.py, but agentwatch/core/risk.py has a second, independent dangerous-command scorer (_DANGEROUS_CMD, risk.py:15) with an even narrower pattern:

(re.compile(r"\brm\s+-rf?\s+/(?:\s|$)"), 95)

This matches only rm -rf / / rm -r / and misses -fr, split flags, long-form, and /etc. Since score_event() is a standalone scoring path used elsewhere, I applied the same normalizer there (risk.py, in _command_danger) so both scorers agree. This adds an import of rm_targets_critical_path from safety.py; there's no circular import because safety.py does not import risk.py. I'm flagging this explicitly rather than making a silent out-of-scope change — happy to pull it into a separate PR if you'd prefer to keep this one limited to safety.py.

Tests

New SAF-012 section in tests/test_safety.py (23 cases):

  • 13 parametrized bypass/variant forms asserted CRITICAL + FS_DELETE_CRITICAL (covers split flags, long-form, interleaved --no-preserve-root, ~, $HOME, and the already-covered -fr//etc as regression guards).
  • 6 benign paths (./node_modules, build, config.tmp, ./dist, a plain file, ls -la /) asserted not CRITICAL — guards against over-blocking.
  • 3 async cases (check_tool_call_sync) asserting the previously-bypassable forms are now blocked under the default policy.
  • 1 asserting risk.py's score_event scores the variants ≥ 90.

Verification

  • Full suite 323 passed, 2 skipped on Linux. ruff check clean. mypy reports no new errors (the two pre-existing SafetyCheckData() call-arg notes at the early-return are untouched).
  • End-to-end check with an approving callback confirms all four variants above now resolve to BLOCKED with the callback never consulted.

One behavioral note (transparency)

echo rm -rf / is classified CRITICAL. This is unchanged from the original regex (which also matched that substring) — neither the old code nor this helper parses shell quoting/word-boundaries. For a safety tool, a false positive on a harmless echo is the safer error than a false negative on a real command, so I kept that bias rather than adding quote-aware parsing (which would be a larger, separate change). Happy to revisit if you'd prefer stricter tokenization.

Summary by CodeRabbit

  • New Features

    • Stronger detection for recursive+force deletion commands targeting critical filesystem locations; such actions are now escalated to Critical risk and forcibly blocked by default.
  • Behavior

    • Risk scoring now overrides other indicators to ensure dangerous deletion attempts are treated as high-severity and recorded with a standardized critical-deletion reason.
  • Tests

    • Added comprehensive tests covering flag-order/spelling variants and -- option-termination to verify scoring and blocking.

The FS_DELETE_CRITICAL pattern only matched a single adjacent -[rf]+
flag token immediately followed by a critical path, so destructive
variants bypassed CRITICAL classification:

  rm -rf --no-preserve-root /   (flag between -rf and path)
  rm -r -f /                    (recursion and force split)
  rm --recursive --force /      (long-form flags)
  rm -fr /                      (fell through to HIGH, not auto-blocked)

Add a flag-spelling-agnostic normalizer (rm_targets_critical_path) that
tokenizes the command and detects recursive intent, force intent, and a
critical target independently. Wire it into RiskScorer.score() and the
block_by_default path in _evaluate_safety so all variants classify as
CRITICAL and are blocked by default.

The same narrow-regex gap existed in risk.py's separate _DANGEROUS_CMD
scorer; apply the normalizer there too.

Add 23 regression tests (SAF-012) covering bypass variants, benign
paths, default-policy blocking, and the risk.py scorer.

Closes sreerevanth#123
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented Jun 3, 2026

Analyzing 200 commits...

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: df48cbd5-6574-4ba4-8563-527ae1466671

📥 Commits

Reviewing files that changed from the base of the PR and between 9083297 and 2fedde7.

📒 Files selected for processing (1)
  • agentwatch/core/risk.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • agentwatch/core/risk.py

📝 Walkthrough

Walkthrough

This PR closes a critical security bypass where recursive-force rm commands targeting system paths were not blocked by default when flags were non-adjacent, spelled differently, or in long form. A new rm_targets_critical_path detector is added to normalize command flags and detect recursive-force intent across multiple spellings, then integrated into risk scoring and runtime enforcement to classify and block these variants as CRITICAL by default. Comprehensive tests validate all bypass forms are now detected and blocked.

Changes

Recursive-force rm on critical paths detection and enforcement

Layer / File(s) Summary
rm critical-path detection helper
agentwatch/core/safety.py
rm_targets_critical_path(text) tokenizes input, detects rm with recursive intent (-r, --recursive) and force intent (-f, --force, --no-preserve-root) across flag forms and flag ordering, respects -- option termination, and confirms the target argument matches a compiled regex for critical paths.
Risk scoring and command-danger integration
agentwatch/core/risk.py, agentwatch/core/safety.py
RiskScorer.score applies rm_targets_critical_path as a post-pattern override to promote matched commands to CRITICAL and append FS_DELETE_CRITICAL with deduping. _command_danger imports and applies the same detection to raise danger scores to at least 95 and record rm_recursive_force_critical_path.
Default safety enforcement override
agentwatch/core/safety.py
SafetyEngine._evaluate_safety forces block_immediate = True when rm_targets_critical_path matches and blocking hasn't already occurred.
Comprehensive test suite for bypass variants
tests/test_safety.py
New SAF-012 tests validate RiskScorer classifies many rm -rf bypass forms (flag reordering, long-form flags, --no-preserve-root) as CRITICAL with expected policy match, that non-critical targets are not escalated, that previously-bypassable variants are blocked by default, and that risk scores for bypass variants meet high danger thresholds; also verifies -- termination behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • sreerevanth/AgentWatch#69: Both PRs modify agentwatch/core/safety.py and the SafetyEngine's risk/blocking flow—this PR adds recursive-force rm critical-path escalation that forces block_immediate, while the retrieved PR changes the text SafetyEngine matches (raw_command/tool_name only), which determines whether the new rm detection/escalation triggers.

Suggested labels

level: intermediate, level2

Poem

🐰 I munched through flags that slipped and slid,
Found hidden gaps where danger hid;
I hopped and stitched each sneaky seam,
Now root's safe in every dream—
Burrow bright and tidy, code well-rid.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main fix: addressing rm -rf bypass vulnerabilities caused by long-form and split flags evading CRITICAL classification, directly matching the PR's core objective of closing issue #123.
Linked Issues check ✅ Passed The PR implements all key coding requirements from #123: a flag-spelling-agnostic normalizer detecting recursive+force intent across all flag forms, integration into RiskScorer and SafetyEngine scoring paths, and comprehensive test coverage for bypass variants and --end-of-options behavior.
Out of Scope Changes check ✅ Passed All code changes remain focused on addressing #123: the rm_targets_critical_path helper and its integration into risk.py and safety.py scoring paths, plus test coverage for bypass detection and --end-of-options behavior.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented Jun 3, 2026

Analysis Complete

Generated ECC bundle from 1 commits | Confidence: 50%

View Pull Request #129

Repository Profile
Attribute Value
Language Python
Framework Not detected
Commit Convention conventional
Test Directory separate
Changed Files (3)
Metric Value
Files changed 3
Additions 178
Deletions 1

Top hotspots

Path Status +/-
tests/test_safety.py modified +87 / -0
agentwatch/core/safety.py modified +81 / -0
agentwatch/core/risk.py modified +10 / -1

Top directories

Directory Files Total changes
agentwatch/core 2 92
tests 1 87
Analysis Depth Readiness (commit-history, 7%)

ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.

Area Status Evidence / Next Step
Commit history Partial 1 commits sampled
CI/CD signals Missing Add workflow files or CI troubleshooting evidence so ECC Tools can reason about pipeline setup.
Security evidence Missing Add AgentShield, audit, SARIF, SBOM, or security review evidence so recommendations can cover security posture.
Harness configuration Missing Add Claude, Codex, OpenCode, Zed, dmux, MCP, plugin, or cross-harness config evidence for harness-agnostic recommendations.
Reference/eval evidence Missing Add fixtures, golden traces, reference sets, or evaluator benchmarks so deeper recommendations have regression evidence.
AI routing and cost controls Missing Add model-routing, budget, usage, or cost-control files before relying on AI-heavy automation recommendations.
Team handoff and project tracking Missing Add roadmap, runbook, project, Linear, or follow-up tracking docs so generated work can land in a team queue.
Reference Set Readiness (0/7, 0%)
Area Status Evidence / Next Step
Deep analyzer corpus Missing Add analyzer fixture, golden, benchmark, or reference-set files that can catch analyzer regressions.
RAG/evaluator comparison Missing Add retrieval or evaluator reference-set comparison fixtures with expected ranking behavior.
PR salvage/review corpus Missing Add stale-PR, review-thread, reopen-flow, or salvage reference cases for queue cleanup automation.
Discussion triage corpus Missing Add public discussion triage fixtures, golden cases, or reference sets for informational, answered, and no-response classifications.
Harness compatibility Missing Add cross-harness, adapter-compliance, or harness-audit evidence for Claude, Codex, OpenCode, Zed, dmux, and agent surfaces.
Security evidence Missing Attach security evidence such as SBOMs, SARIF, audit reports, or AgentShield evidence packs.
CI failure-mode evidence Missing Add captured CI failure logs, dry-run fixtures, or troubleshooting docs for common workflow failure modes.
Generated Instincts (15)
Domain Count
git 4
code-style 9
testing 2

After merging, import with:

/instinct-import .claude/homunculus/instincts/inherited/AgentWatch-instincts.yaml

Files

  • .claude/ecc-tools.json
  • .claude/skills/AgentWatch/SKILL.md
  • .agents/skills/AgentWatch/SKILL.md
  • .agents/skills/AgentWatch/agents/openai.yaml
  • .claude/identity.json
  • .codex/config.toml
  • .codex/AGENTS.md
  • .codex/agents/explorer.toml
  • .codex/agents/reviewer.toml
  • .codex/agents/docs-researcher.toml
  • .claude/homunculus/instincts/inherited/AgentWatch-instincts.yaml

ECC Tools | Everything Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/core/safety.py`:
- Around line 252-265: The parser loop over parts[start + 1:] currently treats
"--" by skipping that token but continues parsing subsequent operands as flags;
change it to mark the end of option parsing so tokens after "--" are never
treated as flags. Concretely, introduce a boolean (e.g., end_of_options)
initialized False; when arg == "--" set end_of_options = True and continue; wrap
the flag-parsing branches (the short-flag regex branch that checks "r"/"R"/"f"
and any long-flag checks) so they only run when not end_of_options; leave the
critical-path match using _RM_CRITICAL_PATH_RE and normal operand handling
active for args after the terminator. Ensure you update references to
has_recursive and has_force accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 204fa237-5785-405b-bd76-8281822278f6

📥 Commits

Reviewing files that changed from the base of the PR and between 1dcdbb8 and 6fcbfe3.

📒 Files selected for processing (3)
  • agentwatch/core/risk.py
  • agentwatch/core/safety.py
  • tests/test_safety.py

Comment thread agentwatch/core/safety.py Outdated
Addresses review feedback: the rm parser treated tokens after the `--`
option terminator as flags, so commands like `rm -r -- --force /etc`
(where --force is a filename) were wrongly escalated to recursive+force
CRITICAL. Introduce an end_of_options flag: once `--` is seen, later
tokens are only matched against the critical-path regex, never parsed as
flags. Genuine targets after `--` (e.g. `rm -rf -- /`) still classify
CRITICAL since the path matcher continues to run.

Add 6 tests (SAF-012): 3 asserting flag-like operands after `--` are
not CRITICAL, 3 asserting real critical paths after `--` stay CRITICAL.
@SakethSumanBathini
Copy link
Copy Markdown
Contributor Author

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/core/safety.py`:
- Around line 252-265: The parser loop over parts[start + 1:] currently treats
"--" by skipping that token but continues parsing subsequent operands as flags;
change it to mark the end of option parsing so tokens after "--" are never
treated as flags. Concretely, introduce a boolean (e.g., end_of_options)
initialized False; when arg == "--" set end_of_options = True and continue; wrap
the flag-parsing branches (the short-flag regex branch that checks "r"/"R"/"f"
and any long-flag checks) so they only run when not end_of_options; leave the
critical-path match using _RM_CRITICAL_PATH_RE and normal operand handling
active for args after the terminator. Ensure you update references to
has_recursive and has_force accordingly.

🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 204fa237-5785-405b-bd76-8281822278f6

📥 Commits
Reviewing files that changed from the base of the PR and between 1dcdbb8 and 6fcbfe3.

📒 Files selected for processing (3)

@coderabbitai good catch — you're right. Fixed in 9083297.
The parser now tracks an end_of_options flag: once -- is seen, subsequent tokens are only matched against the critical-path regex and never parsed as flags. So rm -r -- --force /etc (where --force is a filename) no longer escalates to recursive+force, while genuine targets after the terminator like rm -rf -- / still classify CRITICAL because the path matcher keeps running.
Added 6 tests for it: 3 asserting flag-like operands after -- are not CRITICAL, and 3 asserting real critical paths after -- stay CRITICAL.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Tip

For best results, initiate chat on the files or code changes.

@SakethSumanBathini: Great work on the fix! The end_of_options approach is the correct POSIX-compliant way to handle --, and the dual test coverage (flag-like operands after -- not escalating + real critical targets after -- still classifying as CRITICAL) looks thorough. Thanks for addressing this promptly.

(∧ ﹏ ∧) 🐇✨

@SakethSumanBathini
Copy link
Copy Markdown
Contributor Author

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.

Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

@sreerevanth
Copy link
Copy Markdown
Owner

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.

Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

yea i can merge it if the ci is green 👀

@SakethSumanBathini
Copy link
Copy Markdown
Contributor Author

Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested.
Could you add the ELUSOC and VETERAN labels when you get a chance? Issue #123 already carries them. There's also a CI workflow awaiting your approval to run. Thanks for the project — the safety-engine internals were genuinely interesting to dig into. 🙏

yea i can merge it if the ci is green 👀

Thanks! One CI check is failing (Python Lint & Type Check) — looking into it now and will push the fix shortly. Appreciate the quick response 🙏

@SakethSumanBathini
Copy link
Copy Markdown
Contributor Author

SakethSumanBathini commented Jun 4, 2026

@sreerevanth the ruff fix is pushed — could you approve the CI workflow to run? (There's a "1 workflow awaiting approval" banner on the PR.) Once you approve it and the checks pass, it should be ready to merge. Thanks! 🙏

@SakethSumanBathini
Copy link
Copy Markdown
Contributor Author

Hi @sreerevanth — really appreciate you approving the
workflow and all checks are green now! 🎉

One small thing — I noticed the PR was labeled ADVENTURER,
but issue #123 has the VETERAN label on it. Could you
update it to VETERAN before merging? Happy to wait —
just want to make sure it scores correctly under ELUSOC.

Thanks so much! 🙏

@sreerevanth sreerevanth merged commit bfa12d7 into sreerevanth:main Jun 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Critical rm -rf blocker is bypassable: --no-preserve-root, separated flags, and long-form flags are not blocked by default

2 participants