fix(safety): rm -rf bypass — long-form & split flags evade CRITICAL classification (#123)#128
Conversation
The FS_DELETE_CRITICAL pattern only matched a single adjacent -[rf]+ flag token immediately followed by a critical path, so destructive variants bypassed CRITICAL classification: rm -rf --no-preserve-root / (flag between -rf and path) rm -r -f / (recursion and force split) rm --recursive --force / (long-form flags) rm -fr / (fell through to HIGH, not auto-blocked) Add a flag-spelling-agnostic normalizer (rm_targets_critical_path) that tokenizes the command and detects recursive intent, force intent, and a critical target independently. Wire it into RiskScorer.score() and the block_by_default path in _evaluate_safety so all variants classify as CRITICAL and are blocked by default. The same narrow-regex gap existed in risk.py's separate _DANGEROUS_CMD scorer; apply the normalizer there too. Add 23 regression tests (SAF-012) covering bypass variants, benign paths, default-policy blocking, and the risk.py scorer. Closes sreerevanth#123
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR closes a critical security bypass where recursive-force ChangesRecursive-force rm on critical paths detection and enforcement
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Analysis CompleteGenerated ECC bundle from 1 commits | Confidence: 50% View Pull Request #129Repository Profile
Changed Files (3)
Top hotspots
Top directories
Analysis Depth Readiness (commit-history, 7%)ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.
Reference Set Readiness (0/7, 0%)
Generated Instincts (15)
After merging, import with: Files
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agentwatch/core/safety.py`:
- Around line 252-265: The parser loop over parts[start + 1:] currently treats
"--" by skipping that token but continues parsing subsequent operands as flags;
change it to mark the end of option parsing so tokens after "--" are never
treated as flags. Concretely, introduce a boolean (e.g., end_of_options)
initialized False; when arg == "--" set end_of_options = True and continue; wrap
the flag-parsing branches (the short-flag regex branch that checks "r"/"R"/"f"
and any long-flag checks) so they only run when not end_of_options; leave the
critical-path match using _RM_CRITICAL_PATH_RE and normal operand handling
active for args after the terminator. Ensure you update references to
has_recursive and has_force accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 204fa237-5785-405b-bd76-8281822278f6
📒 Files selected for processing (3)
agentwatch/core/risk.pyagentwatch/core/safety.pytests/test_safety.py
Addresses review feedback: the rm parser treated tokens after the `--` option terminator as flags, so commands like `rm -r -- --force /etc` (where --force is a filename) were wrongly escalated to recursive+force CRITICAL. Introduce an end_of_options flag: once `--` is seen, later tokens are only matched against the critical-path regex, never parsed as flags. Genuine targets after `--` (e.g. `rm -rf -- /`) still classify CRITICAL since the path matcher continues to run. Add 6 tests (SAF-012): 3 asserting flag-like operands after `--` are not CRITICAL, 3 asserting real critical paths after `--` stay CRITICAL.
@coderabbitai good catch — you're right. Fixed in 9083297. |
|
Tip For best results, initiate chat on the files or code changes.
(∧ ﹏ ∧) 🐇✨ |
|
Hi @sreerevanth — this is ready whenever you have time. It closes #123 (the rm -rf bypass): a flag-spelling-agnostic normalizer so split flags, long-form flags, and interleaved --no-preserve-root all classify CRITICAL and hard-block, with the same fix applied to risk.py's separate scorer (flagged in the description in case you'd prefer that split out). CodeRabbit's review is green and the -- option-termination edge case it raised is fixed and tested. Could you add the |
yea i can merge it if the ci is green 👀 |
Thanks! One CI check is failing (Python Lint & Type Check) — looking into it now and will push the fix shortly. Appreciate the quick response 🙏 |
|
@sreerevanth the ruff fix is pushed — could you approve the CI workflow to run? (There's a "1 workflow awaiting approval" banner on the PR.) Once you approve it and the checks pass, it should be ready to merge. Thanks! 🙏 |
|
Hi @sreerevanth — really appreciate you approving the One small thing — I noticed the PR was labeled ADVENTURER, Thanks so much! 🙏 |
Closes #123
Summary
The recursive-delete guard can be evaded by rewording the flags. The most severe case,
rm --recursive --force /, is classified SAFE and executes — a complete bypass of the engine's highest-priority rule. Related forms (rm -r -f /,rm -rf --no-preserve-root /) are misclassified as HIGH instead of CRITICAL, which is only fail-safe on the sync path and becomes exploitable the moment an approval callback is registered.This PR adds a flag-spelling-agnostic normalizer so any recursive and forced
rmaimed at a critical path is classified CRITICAL and hard-blocked, regardless of how the flags are written.Root cause
FS_DELETE_CRITICALinBUILTIN_RISK_PATTERNS(agentwatch/core/safety.py:53) is:Two structural assumptions make it evadable:
-[rf]+requires the recursive and force bits to live in one adjacent token.rm -r -f /(split) andrm --recursive --force /(long-form) don't match.\s*between them).rm -rf --no-preserve-root /puts a flag in between, so the path isn't adjacent.When the CRITICAL pattern misses, the command falls through to the HIGH pattern
rm\s+-[rf]+(safety.py:89,block_by_default=False) — or, for long-form flags, matches nothing and scores SAFE.Exact pre-fix behavior (measured,
DEFAULT_POLICY)With an approval callback registered (the normal production configuration) and the callback approving:
rm -rf /rm -r -f /rm -rf --no-preserve-root /rm --recursive --force /So the guarantee "a recursive force-delete of
/is always blocked" holds only for the literalrm -rf /spelling. TheSAFEcase is a full silent bypass; theHIGHcases are blocked only because the sync path fails safe when it can't prompt — once a real approver exists, they become approvable.(Note:
rm -fr /andrm -rf /etcare not bypasses —[rf]+matchesfr, and/etcis in the path list. They already classified CRITICAL and still do. They're included in the test matrix only to prevent regressions.)The fix
A module-level helper in
safety.py,rm_targets_critical_path(text) -> bool, tokenizes the command and independently detects:-r,-R,--recursive, orr/Rinside any-XYZcluster-f,--force, orfinside a cluster/,~,..,$HOME,$PWD,/home,/etc,/usr,/var,/bin,/sbin,/boot(anchored, via_RM_CRITICAL_PATH_RE)--no-preserve-rootand--are skipped so they can't mask the path. It returnsTrueonly when recursion and force and a critical target are all present, so benign recursive deletes (rm -rf ./node_modules,rm -r ./dist) are unaffected.It's wired into the two decision points that matter:
RiskScorer.score()(safety.py:330) — after the pattern loop, if the helper fires it forcesmatched_level = CRITICALand appends theFS_DELETE_CRITICALreason/policy. This fixes classification everywherescore()is consumed._evaluate_safety()block loop (safety.py:490) — after theblock_by_defaultscan,if not block_immediate and rm_targets_critical_path(full_text): block_immediate = True. This guarantees a hard block independent of the DSL/approval flow, so it can't be downgraded to an approvable prompt.Scope note —
risk.pyalso fixed (please review)The issue targets
safety.py, butagentwatch/core/risk.pyhas a second, independent dangerous-command scorer (_DANGEROUS_CMD,risk.py:15) with an even narrower pattern:This matches only
rm -rf //rm -r /and misses-fr, split flags, long-form, and/etc. Sincescore_event()is a standalone scoring path used elsewhere, I applied the same normalizer there (risk.py, in_command_danger) so both scorers agree. This adds an import ofrm_targets_critical_pathfromsafety.py; there's no circular import becausesafety.pydoes not importrisk.py. I'm flagging this explicitly rather than making a silent out-of-scope change — happy to pull it into a separate PR if you'd prefer to keep this one limited tosafety.py.Tests
New
SAF-012section intests/test_safety.py(23 cases):CRITICAL+FS_DELETE_CRITICAL(covers split flags, long-form, interleaved--no-preserve-root,~,$HOME, and the already-covered-fr//etcas regression guards)../node_modules,build,config.tmp,./dist, a plain file,ls -la /) asserted notCRITICAL— guards against over-blocking.check_tool_call_sync) asserting the previously-bypassable forms are now blocked under the default policy.risk.py'sscore_eventscores the variants ≥ 90.Verification
ruff checkclean.mypyreports no new errors (the two pre-existingSafetyCheckData()call-arg notes at the early-return are untouched).BLOCKEDwith the callback never consulted.One behavioral note (transparency)
echo rm -rf /is classifiedCRITICAL. This is unchanged from the original regex (which also matched that substring) — neither the old code nor this helper parses shell quoting/word-boundaries. For a safety tool, a false positive on a harmlessechois the safer error than a false negative on a real command, so I kept that bias rather than adding quote-aware parsing (which would be a larger, separate change). Happy to revisit if you'd prefer stricter tokenization.Summary by CodeRabbit
New Features
Behavior
Tests
--option-termination to verify scoring and blocking.