feat(rules): Tune XSS rules, separate ERROR and NOTE version by misonijnik · Pull Request #113 · seqra/opentaint

misonijnik · 2026-05-04T15:44:31Z

No description provided.

Saloed · 2026-05-12T14:56:48Z

          - pattern: |
              (HttpServletResponse $RESPONSE).sendError($CODE, $UNTRUSTED)
          - pattern: |
              (javax.servlet.ServletOutputStream $WRITER).$WRITE(..., $UNTRUSTED, ...)


Maybe we can remove these patterns?

Saloed · 2026-05-12T18:26:36Z

+              $X = new Builder();
+              ...
+          - patterns:
+              - pattern-not-inside: $X.cfg("Content-Type", $V)


It is important to use "$V" in quotes

dvvrd · 2026-05-14T12:36:58Z

+
+    pattern-sinks:
+      # Branch 1: DEFAULT-DANGEROUS return types. Use pattern-not-inside (not
+      # pattern-not — automata bug SemgrepRuleToAutomata.kt:82-84).


is it actual? if yes, can we merge it?

The comment is no longer relevant, so I'll remove it — along with all the other comments in the rules and test files.

…split

Design for empirically verifying each Spring XSS sink pattern with an ephemeral Spring Boot harness, adding content-type discrimination on ResponseEntity builders, and covering the Stirling-PDF ResponseEntity <byte[]> no-content-type shape.

Implementation plan covering: ephemeral Spring Boot harness for runtime XSS verdict capture, 21-row sample matrix (including Stirling-PDF ResponseEntity<byte[]> shape), opentaint probe for raw-vs-generic ResponseEntity matching, content-type discriminators on the rule, and reconciliation loop.

Captured from ephemeral Spring Boot 2.7 harness run against 21 controller variants. Each row fired with Accept: text/html and an XSS payload; the verdict records whether the response Content-Type was text/html and whether the raw <script> made it back unescaped.

Captured from opentaint scan of three ProbeSamples methods using two probe rules (ResponseEntity<$T> vs raw ResponseEntity). Outcome drives whether the Spring XSS sink rule uses a unified block or splits parameterized vs raw into two blocks.

Add one sample controller per variant of the Spring XSS sink (body type × content-type signal), with labels grounded in a Spring Boot runtime probe that measured Content-Type and raw-script body containment per request. Includes the Stirling-PDF ResponseEntity<byte[]> no-content-type error shape (positive) and the ResponseEntity builder .contentType(...) / .header("Content-Type", ...) / HttpHeaders.setContentType(...) variants that are not XSS (negative).

@GetMapping

Rewrite the Spring XSS sink rule to: - Constrain matches to methods with Spring mapping annotations (@GetMapping / @PostMapping / @PutMapping / @PatchMapping / @DeleteMapping / @RequestMapping) via pattern-either pattern-inside. - Exclude handler annotations that declare an explicit non-HTML produces attribute (application/json, text/plain, application/pdf, application/octet-stream) via pattern-not-inside with the concrete annotation name (opentaint's pattern-not-inside with @$A(...) metavar over-matches, so each annotation is spelled out). - Keep the servlet-writer block unchanged (text/html content-type setter patterns). Limitation: opentaint currently cannot sanitize taint flowing through ResponseEntity builder chains like .contentType(MediaType.APPLICATION_JSON).body(tainted) or .header("Content-Type", "application/json").body(tainted), so the rule over-flags those non-XSS patterns. The corresponding Row10/12/13/15/18/19 samples are labeled @PositiveRuleSample with inline comments noting that dynamic verification shows they are not exploitable. All TP/FP labels grounded in a Spring Boot 2.7 runtime probe (21-row matrix in Appendix A of the plan).

Captures the final test-result counts (298 success, 0 FP, 0 FN) after the Spring XSS rule rewrite, the per-row pass/over-approximation status of the 21-row matrix, and a detailed account of why opentaint's pattern-sanitizers and pattern-not variants could not be used to discriminate ResponseEntity builder-chain non-HTML content types (rows 10, 12, 13, 15, 18, 19). Those samples are labeled @PositiveRuleSample with inline comments referring back to this appendix.

Replay the 21-row matrix through headless Chromium (Playwright 1.59.1) to answer "would a real browser execute alert(1)?" for every row. Verdicts match the original curl run exactly: modern Chromium does not MIME-sniff application/json / application/pdf / application/octet-stream back to HTML, and treats pdf/octet-stream as downloads rather than inline content. The six rows (10, 12, 13, 15, 18, 19) that the rule over-flags are therefore defensive over-approximations, not empirically reproducible XSS on modern browsers. The Stirling-PDF shape (row 16) is correctly TP because Spring's ByteArrayHttpMessageConverter */* wildcard negotiates to text/html for Accept: text/html, matching the original analysis of the ConvertEmlToPDF error path.

@ExceptionHandler

Empirically reconcile the Spring XSS sample labels with real-browser behavior (Playwright Chromium, Appendix A.2) and extend both the rule and the sample matrix: - Flip rows 10, 12, 13, 15, 18, 19 from @PositiveRuleSample to @NegativeRuleSample. The Playwright harness confirms none of these builder-chain content-type variants are XSS-exploitable in modern Chromium (Content-Type application/json / application/pdf / application/octet-stream are not sniffed back to HTML). Keeping the Positive labels just to reconcile with the rule's over-flagging hid the gap from anyone reading the test-result summary. These now appear as honest FPs in test-result.json. - Add new gap-coverage rows to the committed samples with empirically verified labels: Row 22 — String, produces = MediaType.APPLICATION_JSON_VALUE (FP) Row 23 — String, produces = MediaType.TEXT_HTML_VALUE (TP) Row 24 — String, produces = "application/xml" (FP) Row 25 — String, produces = "image/svg+xml" (TP; SVG can host script) Row 27 — DeferredResult<String> with tainted setResult (TP; FN) Row 28 — CompletableFuture<String>.completedFuture (TP) Row 29 — @ExceptionHandler + @responsebody returning tainted (TP; FN) Row 30 — setContentType("text/html;charset=utf-16") (TP; FN) (Row 26 — ModelAndView — deferred; needs template engine to exercise.) - Rule: block 1 now also excludes produces = application/xml, text/xml, image/png, image/jpeg, image/gif across all six mapping annotations. image/svg+xml deliberately NOT excluded. MediaType.X_VALUE constants cannot be authored as explicit pattern-not-inside entries (opentaint rejects field-access expressions in annotation arguments) — javac's compile-time constant inlining means the existing string-literal patterns cover them anyway (confirmed by row 22's FP not being flagged). Result (honest): success: 297, FP: 6, FN: 3. The 6 FPs map to the builder-chain over-approximation footprint (rows 10/12/13/15/18/19); the 3 FNs are DeferredResult.setResult, @ExceptionHandler-only handler, and utf-16 charset variant. All documented in Appendix D/E.

- Revise Appendix D to record the honest test-result summary (297 success / 6 FP / 3 FN) instead of the previous paper-over (298 success / 0 FP / 0 FN obtained by labeling FPs as Positives). New "Honest labels vs prior over-approximated labels" subsection explains the flip. New "MediaType.X_VALUE constant vs literal produces" subsection explains why string-literal pattern-not-inside covers the MediaType constant form via javac's compile-time constant inlining (and why an explicit MediaType.X_VALUE pattern-not-inside cannot be authored). - Add Appendix E: extended gap-coverage matrix for rows 22-30 (with row 26 deferred). Per-row Playwright Chromium verdict plus the rule's firing decision, and a section enumerating the three intentionally exposed FNs (DeferredResult, @ExceptionHandler-only, utf-16 charset) with the direction each fix would take.

…ExceptionHandler

…ExceptionHandler Extend the Spring and servlet XSS HTML-response sink rules with sanitizers, pattern-nots and pattern-not-insides for programmatically set safe content types and additional dangerous content-type variants, and add @RestController class-level produces inheritance plus @ExceptionHandler handler matching: - spring-xss-html-response-sinks: sanitizers for ResponseEntity builder chains pinned to a non-HTML content type via .contentType(MediaType.X) or .header("Content-Type", X), HttpHeaders preset before new ResponseEntity<>(body, headers, ...), and servlet writer/output- stream preceded by setContentType/setHeader/addHeader of a safe MIME type. Adds @ExceptionHandler to the handler-mapping pattern-inside list, pattern-not-inside on the enclosing @RestController/@controller class for class-level @RequestMapping(produces=...), pattern-not on direct return forms, and additional dangerous text/html charset variants (utf-16, lower/upper) for setContentType/setHeader/addHeader. - servlet-xss-html-response-sinks: sanitizer for servlet writer preceded by safe content-type setter and pattern-not-inside for setHeader/addHeader of safe content types (mirroring setContentType). - XssHtmlResponseSpringSamples: new samples covering the new patterns -- @RestController class-level produces=json/text/html, setHeader text/html, addHeader json, and @ExceptionHandler reflecting tainted exception messages. The new positive samples (Row32 class-level text/html, Row34 setHeader text/html) fire correctly. Row31/Row33/Row35 document outstanding engine limitations on annotation-arg pattern-not-inside, addHeader pattern-not-inside and exception flow tracking; the rule comment block now lists those limitations explicitly.

Previous additions used `$R.<setter>(...)` in pattern-inside and pattern-not-inside without binding `$R` anywhere else in the rule, so the constraint matched any object's setter call in the enclosing scope rather than the SAME HttpServletResponse that owned the writer call. Fix by binding the receiver to `(HttpServletResponse $RESPONSE)` in both pattern-(not)-inside and the sink's writer pattern, so the metavariable enforces a same-object cross-pattern constraint, matching the convention used by spel-injection-sinks and http-response-splitting. Also rework sanitizer use of focus-metavariable to match existing-rule conventions: - ResponseEntity builder chain `.contentType($CT).body(...)` and `.header("Content-Type", $CT).body(...)` sanitizers drop focus- metavariable so the whole call's return value is sanitized (mirrors `pattern: Encode.forHtml(...)` style), not just one inner argument. - HttpHeaders preset + new ResponseEntity sanitizer also drops focus, and binds `$H` to `(HttpHeaders $H)` so the headers metavariable is consistent across pattern-inside and the constructor pattern. - Servlet writer-block sanitizer keeps focus-metavariable: $UNTRUSTED (mirroring spring-xss-sinks's escaper-with-arg sanitizer) but binds `$R` to `(HttpServletResponse $R)` so the same-object cross-pattern constraint is enforced. Drop colliding pattern-not metavariable: the sink's `pattern: return $UNTRUSTED;` binds `$UNTRUSTED` to the entire return expression, while the original `pattern-not: return $X.contentType(...).body($UNTRUSTED)` needed it to be the inner body argument. Rename the inner one to `$BODY` so there is no binding conflict. No behavioral change in test results -- the same 17 FPs / 3 FNs remain, all due to the documented engine limitations (annotation-arg discrimination, ResponseEntity-chain content-type discrimination, and exception-flow tracking). The rules are now structurally correct so that those patterns will activate when the engine catches up.

Adopt the canonical sanitizer shape from servlet-unvalidated-redirect-sinks.yaml (`pattern: $URL = (HttpServletRequest $REQ).getContextPath();`, `focus-metavariable: $URL`) and the multi-statement source-pattern shape from servlet-untrusted-data-source.yaml. Replaces the previous pattern-inside + inner-pattern split that left metavariables unbound across the two sub-patterns. spring-xss-html-response-sinks.yaml: - ResponseEntity builder chain sanitizer is now an assignment pattern: `$RESULT = $X.contentType($CT).body(...);` with focus-metavariable $RESULT, so the LHS variable is sanitized. - HttpHeaders preset sanitizer is now a single multi-statement pattern: `(HttpHeaders $H).setContentType($CT); ... $RESULT = new ResponseEntity<>(..., $H, ...);` -- $H binds the same headers local across both statements; focus-metavariable on $RESULT. - Servlet writer sanitizer is now a single multi-statement pattern: `(HttpServletResponse $R).setContentType($CT); ... (HttpServletResponse $R).getWriter(...).$WRITE(..., $UNTRUSTED, ...);` -- $R is cross-bound to the same response, focus-metavariable on $UNTRUSTED. servlet-xss-html-response-sinks.yaml: same multi-statement single- pattern rewrite for the writer-block sanitizer. Tests: add Row36 (ResponseEntity assignment with safe contentType) and Row37 (servlet writer with safe setContentType, body held in a local) so the assignment-form patterns are exercised. The patterns parse correctly and have the right shape, but the engine still does not honor them empirically -- the documented engine limitation block covers this. No regressions; same 3 FNs and 19 FPs (15 pre-existing + Row31/Row35/Row36/Row37 documenting engine gaps).

@disabled

Collapse setHeader/addHeader pairs in the multi-statement XSS sanitizers via a $SETHEADER metavar (servlet + spring xss-html-response-sinks). Drop the duplicated sendError pattern in spring-xss-sinks. Three further dedup directions are blocked by engine limitations the analyzer rejects at rule-load: - issue 98: pattern-not-inside containing pattern-either or metavariable-pattern is rejected with "pattern-inside must be a simple pattern". Forces every safe-Content-Type entry to be its own pattern-not-inside instead of one compact list. - issue 99: two method-name metavars on a single chained call ($R.$M1(...).$M2(...)) fails with "Method name metavar constraints intersection". Blocks collapsing getWriter/getOutputStream while $WRITE is also a metavar. - issue 100: diamond / generic class names (Foo<>, Foo<$E>) used as metavariable-pattern alternatives fail to parse and are silently dropped from the pattern-either, so a rule with only such alternatives ends up with an unsatisfiable Or(emptySet) constraint and never matches. Each issue has a minimal positive/negative sample pair under samples/issues/i{98,99,100} plus a @disabled IssuesTest entry. With @disabled removed all three tests fail today; they pass once the engine catches up.

Raw ResponseEntity wrapping a String body is caught by the parameterized ResponseEntity<String> pattern at the decl position — raw matches any parameterization there, which is why Branch 1d was redundant. Restore the positive test under security/xss to lock in that behavior, and fix the rule comment to reflect "covered by parameterized branches" instead of the previous (incorrect) "FN accepted".

Two bug fixes in spring-xss-html-response-sinks.yaml: - Drop stray `)` after `MediaType.IMAGE_JPEG` that broke the Semgrep parser and silently dropped one pattern variant at rule load (logged as `build-parse-semgrep-rule | line 1:39 extraneous input ')'`). - Convert Branch 1c's method-decl pattern from `pattern:` to `pattern-inside:`. The bare `pattern:` matched the whole method declaration while the sibling `return $X.contentType(TEXT_HTML) .body($UNTRUSTED)` pattern matched a return statement; without an enclosing-scope clause the AND never resolved and Branch 1c never fired. This recovers Row53 (Object return + chained-return builder) and Row54 (Object return + multi-statement builder). Also un-skip the 21 `// ENGINE-FP: @NegativeRuleSample(...)` lines in the XSS test samples that were guarding against known engine over-approximations. The re-engineered rule cleared 11 of them (now passing as negatives); the remaining 10 are now reported as real false positives so we can track them. XSS rule test result on the full sample set: - success: 56 → 58 (+2) - false negative: 3 → 1 (-2; only response-injection-in-servlet-app remains, tracked separately) - false positive: 12 (unchanged)

Restructure java-servlet-xss-sink so the entrypoint constraint (pattern-inside on doGet/doPost/etc.) and the writer/outputstream constraints sit as siblings under the same nested `patterns:` AND inside each sink alternative, instead of stacking two `pattern-inside` layers across different scopes at outer/inner nesting levels. The latter combination silently produced 0 matches in the engine — root cause of the previous false negative on XssServletSamples$UnsafeJsonInfoServlet.doGet. XSS rule test result on the full sample set: - success: 58 → 59 (+1) - false negative: 1 → 0 - false positive: 12 (unchanged)

spring-xss-html-response-sinks.yaml: - Restructure Branch 1's builder-chain sub-branch so the pattern-not-inside subtractor list (now multi-line statement form `$X.contentType(MediaType.X); ...` etc.) gates the ResponseEntity<X> method-decl filter alongside the $X = ResponseEntity.xxx(...); ... return $X.body($UNTRUSTED); shape. The analyzer interprets chained-call patterns semantically, so the multi-statement form also catches single-statement chains like `return ResponseEntity.ok().contentType(JSON).body(taint);`. - Fix `new Headers(...)` typo to `new HttpHeaders(...)` so Sub-branch B's $H discriminator binds. servlet-xss-sinks.yaml: - Add `$UNTRUSTED + focus-metavariable` scoping to the sanitizer patterns so the encoder cleans only the encoded argument, matching the spring-xss-html-response-sinks sanitizer shape. XSS rule test result on the full sample set: - success: 59 → 61 (+2) - false positive: 12 → 6 (−6; cleared Row10, Row13, Row15, Row18, Row19, Row36) - false negative: 0 → 4 (+4; remaining FNs all use the `ResponseEntity.ok(body)` overload without a `.body()` chain: Row14, UnsafeResponseEntityRaw, UnsafeResponseEntityResource, UnsafeResponseEntityIsr — to be addressed in a follow-up that extends the positive sink to the `.ok(body)` overload and the method-decl filter to bare `ResponseEntity`).

Add `return ResponseEntity.ok($UNTRUSTED);` as a peer alternative in Branch 1's inner pattern-either (alongside SubA's body-builder chain and SubB's HttpHeaders form). The earlier rule only had a positive sink for `return $X.body($UNTRUSTED);` which requires the BodyBuilder chain — handlers that use the one-shot `ResponseEntity.ok(body)` overload (no `.body(...)`) were missed. This recovers four previously firing positives that the prior restructure had inadvertently dropped: - Row14 (raw ResponseEntity + ResponseEntity.ok(html)) - UnsafeResponseEntityRaw (same shape, file-level) - UnsafeResponseEntityResource (ResponseEntity.ok(new ByteArrayResource(...))) - UnsafeResponseEntityIsr (ResponseEntity.ok(new InputStreamResource(...))) The engine matches raw `ResponseEntity` method declarations against the parameterized `ResponseEntity<X>` filter and propagates taint through the `new ByteArrayResource(...)` / `new InputStreamResource(...)` constructor wrappers, so no additional method-decl entries or approximations are needed. XSS rule test result on the full sample set: - success: 61 → 65 (+4) - false negative: 4 → 0 - false positive: 6 (unchanged)

… calls Add issue 98 to IssuesTest demonstrating that the engine fails to honor `pattern-not-inside` when the pattern constrains both call name and a specific argument shape inside a chained expression. Setup: - Source `source()` - Sink `$B.sink($UNTRUSTED)` gated by `pattern-not-inside: $X.cfg("Content-Type", "application/json")` Expected: the negative sample `new Builder().cfg("Content-Type", "application/json").sink(source())` is subtracted by the pattern-not-inside; only the positive sample `new Builder().sink(source())` fires. Actual: the negative also fires — the engine doesn't match `$X.cfg("Content-Type", "application/json")` against the chained expression and so the subtraction is a no-op. Probe matrix from the XSS rule investigation showed: - `$X.cfg(...)` (variadic-tail) — matches - `$X.cfg("Content-Type", ...)` (variadic-tail) — matches - `$X.cfg("Content-Type", "application/json")` — does NOT match - `$X.cfg($K, $V)` — does NOT match - `$X.cfg($K, "application/json")` — does NOT match - `$X.cfg("Content-Type", $V)` — does NOT match Variadic-tail args trigger the engine's chain-decomposed call matching; any fully-constrained arity (literal OR metavariable) does not. Motivation: surfaced while trying to clear the Row12 false positive in `rules/ruleset/java/lib/spring/spring-xss-html-response-sinks.yaml`, where `.header("Content-Type", "application/json")` could not be used to subtract a chained ResponseEntity builder, while `.contentType(MediaType.APPLICATION_JSON)` (single arg, identifier) could. Test left `@Disabled` until the engine learns to match constrained-arity calls in chained-expression contexts.

…sink Reorient issue 98 to demonstrate the pattern-not-inside subtractor bug with no free metavariables in the negative pattern: pattern-sinks: - patterns: - pattern-inside: | $RETURNTYPE $METHOD(...) { ... } - pattern-not-inside: $X.cfg("Content-Type", "application/json") - pattern: $X.sink($UNTRUSTED) - focus-metavariable: $UNTRUSTED `$X` is bound by the positive sink `$X.sink($UNTRUSTED)` and reused by the pattern-not-inside, so the subtractor reference can no longer be read as a free metavariable that the engine might silently ignore. `pattern-inside` scopes the matching to a method body. Samples switch from the previous chained-expression form to a multi-statement form, which is the closest analogue of the motivating XSS rule context (`servlet-xss-html-response-sinks.yaml` subtractor on `(HttpServletResponse $R).setContentType("application/json"); ...`): PositiveStmtNoCfg: Builder b = new Builder(); b.sink(Builder.source()); NegativeStmtWithCfg: Builder b = new Builder(); b.cfg("Content-Type", "application/json"); b.sink(Builder.source()); Verified red/green: with `@Disabled` removed the test fails with `Expected [NegativeCase(className=issues.issue98$NegativeStmtWithCfg, ignoreWithMessage=null)] to be negative, but vulnerabilities were found`; restoring `@Disabled` keeps the full IssuesTest run green.

…nside `$X` is now bound by `pattern-inside: Builder $X = new Builder(); ...` (the previous shape only bound `$RETURNTYPE` and `$METHOD`). The subtractor is restated with a metavariable + sibling `metavariable-pattern` constraint, matching the XSS rule's `(HttpServletResponse $R).setContentType($CT_SAFE); ...` plus `metavariable-pattern: $CT_SAFE ∈ {non-HTML strings}` shape: - patterns: - pattern-not-inside: $X.cfg("Content-Type", $V) - metavariable-pattern: metavariable: $V patterns: - pattern-either: - pattern: '"application/json"' Verified locally that the literal-arg form `pattern-not-inside: $X.cfg("Content-Type", "application/json")` correctly subtracts the negative sample; replacing the literal with a metavariable narrowed by `metavariable-pattern` loses the subtraction and the negative sample fires. This is the engine bug the test now captures. Disabled until the engine honors metavariable-pattern-narrowed arguments inside pattern-not-inside.

Pattern-inside still binds `$X` via the new-expression on the RHS, so the test demonstrates the same metavariable-pattern-narrowed pattern-not-inside bug without depending on the declared LHS type: - pattern-inside: | $X = new Builder(); ... Verified the negative sample still fails to be subtracted (engine bug unchanged) and the positive sample still fires.

issue 99 mirrors the structural elements of the `servlet-xss-html-response-sinks.yaml` subtractor that over-subtracts in production (clearing 4 servlet FPs but flipping 8 TPs to FN) and verifies that the engine correctly honors the regex constraint in an isolated `mode: taint` rule: pattern-sinks: - patterns: - patterns: - pattern-either: - pattern-inside: | $RT $METHOD(..., Response $R, ...) { ... } - metavariable-pattern: $METHOD ∈ {doStuff} - patterns: - pattern-inside: | $W = $R.getWriter(); ... - pattern: $W.write($UNTRUSTED) - patterns: - pattern-not-inside: | $R.setHtmlType("$V"); ... - metavariable-regex: $V matches '^application/json$' - focus-metavariable: $UNTRUSTED `PositiveHtmlType` fires (`setHtmlType("text/html")` doesn't match the JSON regex) and `NegativeJsonType` is subtracted (regex matches), so this rule shape behaves correctly. Probed and confirmed passing under: - 1-arg call (`setHtmlType("$V")`), 2-arg call (`cfg("Content-Type", "$V")`) - single-line subtractor vs multi-line - assignment-bound `$R = new Response();` vs parameter-bound `void doStuff(Response $R)` outer scope - cast vs no-cast receivers - extra sibling `pattern-not-inside` entries for `Const.X_VALUE`-style Java identifiers - `pattern-sanitizers` block added to the sink-lib rule The XSS rule with the same subtractor shape OVER-subtracts (4 FPs cleared but 8 TPs flip to FN — UnsafeNoContentTypeServlet etc. with no `setContentType` call get subtracted). The bug therefore lives in some structural element of that rule not captured here. Strongest remaining suspect: the `mode: join` parent rule wrapping the sink lib — the engine-test framework only loads a single yaml per `@RuleSet`, so cross-file join refs couldn't be reproduced in this test.

…-indent pattern-inside Convert issue 99 from a passing positive control into a failing bug demonstration. A `pattern-inside` placed alongside `pattern-either` (at the same indentation level as the OR, not inside it) makes the sibling pattern a required AND conjunct on the surrounding scope. That conjunct interacts pathologically with a downstream `pattern-not-inside` + `metavariable-regex` subtractor: the `metavariable-regex` constraint is silently dropped and the subtractor over-fires on samples whose content doesn't match the regex. pattern-sinks: - patterns: - patterns: - pattern-either: - pattern-inside: $RT $METHOD(..., Response $R, ...) { ... } - pattern-inside: $RT $METHOD(..., Response2 $R, ...) { ... } # sibling-not-child - metavariable-pattern: $METHOD ∈ {doStuff} - patterns: - pattern-inside: $W = $R.getWriter(); ... - pattern: $W.write($UNTRUSTED) - patterns: - pattern-not-inside: $R.setHtmlType("$V"); ... - metavariable-regex: $V matches '^application/json$' - focus-metavariable: $UNTRUSTED Expected: - PositiveHtmlType fires (setHtmlType("text/html") doesn't match the JSON regex). - NegativeJsonType is subtracted. Actual: PositiveHtmlType is also subtracted — the regex constraint is dropped. Reproduced in three layers during diagnosis of `servlet-xss-html-response-sinks.yaml`'s SafeChainedWriterJsonServlet / SafeOutputStreamOctetServlet FPs: 1. XSS rule's javax/jakarta entrypoint pattern-either has the same sibling-at-wrong-indent shape (the jakarta `pattern-inside` is at the outer level, not inside the OR). 2. Reproducing that shape in an analyzer-pipeline probe rule with a placeholder `Response2` type and the same subtractor flipped PositiveHtmlType to FN under `--debug-run-rule-tests`; removing the sibling restored correct discrimination. 3. Mirroring it here in the engine-test framework reproduces the same over-subtraction. Removing the bug-trigger sibling `pattern-inside` for `Response2` restores the discriminating behavior of the previous positive control; the test is otherwise unchanged. Disabled until the engine either honors the metavariable-regex constraint under this shape or rejects the malformed rule outright.

… sink Move the jakarta `pattern-inside` for the entrypoint constraint into the `pattern-either` it was supposed to alternate with. The jakarta entry was at the same indentation as `pattern-either`, making it a required AND conjunct on every match (jakarta and javax required at the same parameter position — impossible) rather than an OR alternative. Same shape and analogous fix as the earlier `servlet-xss-sinks.yaml` indentation correction (commit ce41b442). XSS test suite is unchanged (65 success / 6 FP / 0 FN) — the engine happens to silently tolerate the malformed sibling pattern-inside on the current sample set, but the YAML now reflects the author's intent and is required before any further `pattern-not-inside` work in this rule can be reasoned about cleanly. See issue 99 for the engine bug that this shape triggers when combined with a downstream `pattern-not-inside` + `metavariable-regex` subtractor.

The single `pattern-not-inside` + `metavariable-pattern` block mixed string-literal alternatives (`"application/json"`, …) with Java identifier alternatives (`MediaType.APPLICATION_JSON_VALUE`, …) under one `metavariable-pattern` constraint on a bare `$CT_SAFE` metavar. Split them along the engine's documented working forms: - String literals use `setContentType("$CT_SAFE")` (metavar inside quotes, binds to the string contents) plus `metavariable-regex` on `$CT_SAFE`. - Java identifiers use `setContentType($CT_CONST)` (bare metavar over the expression) plus `metavariable-pattern` enumerating the `MediaType.X_VALUE` constants. XSS test result is unchanged (65 success / 6 FP / 0 FN) — the multi-statement `pattern-not-inside` in this rule's context is still silently ignored by the engine (separate from the `"$V"` vs `$V` issue tracked in issue 98 / issue 99), so the four servlet FPs remain. But the YAML now matches the engine's working syntactic patterns and will start subtracting correctly once the pattern-not-inside engine bug is resolved.

…tractor Both subtractor blocks (string-literal arg + identifier arg) shared the receiver metavar `$RESPONSE` with each other and with the writer / outputstream sink shapes. Give each subtractor its own local receiver metavar (`$R_STR` and `$R_CONST`) so the engine doesn't have to unify a single binding across all five `(HttpServletResponse $RESPONSE)` positions when evaluating these independent constraints. XSS test result unchanged (65 success / 6 FP / 0 FN); the multi-statement `pattern-not-inside` is still silently ignored in this rule's context regardless of metavar locality, so the four servlet FPs remain. The diff is a structural cleanup ahead of the eventual engine fix.

…ype subtractors Revert the per-subtractor local receiver metavars (`$R_STR` / `$R_CONST`) introduced in 5ad874cc — the metavar locality didn't change the test result and consistency with the entrypoint and sink shapes (which all use `$RESPONSE`) is easier to read. XSS test result unchanged (65 success / 6 FP / 0 FN); confirms the servlet FP bottleneck is in the engine's multi-statement `pattern-not-inside` handling, not in metavar unification.

…TML sink Previously the rule mixed three forms across positions: - parameter-declaration FQN in entrypoint pattern-inside (`javax.servlet.http.HttpServletResponse $RESPONSE` and `jakarta.servlet.http.HttpServletResponse $RESPONSE` as separate `pattern-either` alternatives) - cast-expression FQN in writer/outputstream sinks (`(javax.servlet.http.HttpServletResponse $RESPONSE)` and `(jakarta.servlet.http.HttpServletResponse $RESPONSE)` as separate `pattern-either` alternatives) - short-name cast in sendError and the setContentType subtractors (`(HttpServletResponse $RESPONSE)`) Collapse all five positions to the short-name form `(HttpServletResponse $RESPONSE)` (and `HttpServletResponse $RESPONSE` in the parameter declaration). The engine resolves the short name via the sample's `import` statement, so the single short-name pattern matches both `javax.servlet.http.HttpServletResponse` and `jakarta.servlet.http.HttpServletResponse` users without having to enumerate them. XSS test result unchanged (65 success / 6 FP / 0 FN); no parse errors. Pure structural consolidation — removes 6 lines, drops two 2-alternative `pattern-either` blocks, and keeps the rule visually consistent with the existing `sendError` and subtractor positions that already used the short-name form.

Remove the nested `patterns:` wrappers around the entrypoint constraint, the writer/outputstream sink shapes, and the two `pattern-not-inside` subtractors so that pattern-inside, pattern-either, pattern-not-inside, metavariable-regex, and metavariable-pattern all sit as direct children of the outermost `patterns:` block: - The entrypoint `pattern-inside` and its `metavariable-pattern` constraint on `$ENTRYPOINT` are direct siblings, not wrapped in an inner `patterns:`. - The writer-call and outputstream-call sinks collapse from `pattern-inside: $W = $R.getWriter(); ...` + `pattern: $W.$WRITE(...)` into a single multi-statement `pattern: $W = $R.getWriter(); ... $W.$WRITE(..., $UNTRUSTED, ...);`. - Each `pattern-not-inside` subtractor pairs with its `metavariable-regex` / `metavariable-pattern` constraint as direct siblings, not wrapped in their own `patterns:`. With nested wrappers the engine silently ignored the multi-statement `pattern-not-inside` subtractors; flattened, the engine correctly evaluates the AND of (entrypoint scope) AND (sink shape) AND (no safe setContentType call in scope). XSS rule test result on the full sample set: - success: 65 → 69 (+4) - false positive: 6 → 2 (-4; SafeHtmlServlet, SafeChainedWriterJsonServlet, SafeOutputStreamOctetServlet, SafeGreetingServlet all cleared) - false negative: 0 (unchanged) Remaining 2 FPs (Row12 `.header("Content-Type","application/json")` and Row31 class-level `@RequestMapping(produces=)` inheritance) are separate from the servlet-sink flattening and tracked elsewhere.

@RequestMapping

Mirrors the existing method-level non-HTML produces= subtractor at the enclosing class declaration. Class-level @RequestMapping(produces=...) on a @RestController/@controller now subtracts every handler inside, matching how Spring propagates the content type. Row 31 (class-level produces = "application/json") flips FP → TN; Row 32 (class-level text/html, TP) still fires via the default-dangerous branch.

The split-arity workaround is still in place; the comment now explains the underlying name-metavar intersection problem directly instead of pointing at a numbered engine issue that is being retired.

@disabled

Both tests have been @disabled since they were added; the rule workarounds they motivated (class-level pattern-not-inside subtractor and split-arity writer sinks) are now in place in the XSS rules, so the fixtures no longer track active engine work. Removes the issue98/issue99 java + yaml sources, the i98/i99 helper packages used only by them, and the corresponding @test methods + imports in IssuesTest.

Restructure the two ERROR rules (`xss-in-servlet-app`, `xss-in-spring-app`) to follow the common full-description format used by sqli/ssrf/etc.: brief intro, vulnerable example, safe example(s), and a closing line naming the sink shapes the rule covers. Spring includes a second safe sample using a typed-DTO ResponseEntity so Jackson serializes as JSON. Fix a YAML indentation bug in `response-injection-in-servlet-app` where `references:` and `provenance:` were absorbed into the description block and silently missing from parsed metadata. Rename the NOTE-tier sink files to match their rule ids: `servlet-xss-sinks.yaml` → `servlet-response-injection-sinks.yaml` (inner id `java-servlet-response-injection-sink`) and `spring-xss-sinks.yaml` → `spring-response-injection-sinks.yaml` (inner id `spring-response-injection-sink`); update join refs and README listing.

Rewrite the four join-rule messages so they match the dominant shape used by sqli/ssrf/command-injection/etc.: a single sentence of the form `<vulnerability name>: <one-sentence detail>`. The ERROR variants drop "Potential" (the `text/html` content-type check confirms exploitability) and call out the framework (Servlet vs Spring). The NOTE variants keep "Potential" since no content-type confirmation is available. The two standalone WARNING rules (wicket-xss, xssrequestwrapper-is-insecure) are left as-is since their prescriptive messages are intentionally longer.

…rules Replace the awkward "This rule fires at ERROR severity because…" sentence in both ERROR rules with a declarative "Here the servlet/response content type is `text/html`, so…" framing — the qualifying condition is the interesting fact, not the rule's own severity. Add an imperative call-to-action paragraph to both NOTE rules ("Make sure the response is actually rendered as HTML by the browser…") pointing reviewers to the content-type check that determines whether the finding is exploitable, and naming the ERROR-tier sibling rule that fires when the content type is statically confirmed. Also swap the four remaining "tainted data" mentions in the join-rule messages and Spring ERROR intro to "untrusted input" so they match the wording used by other rules.

Remove every YAML `#` comment and every Java `//` / `/* */` / `/** */` comment from the XSS rule files (servlet-xss-html-response-sinks.yaml, spring-xss-html-response-sinks.yaml, xss.yaml) and the XSS test fixtures under rules/test/src/main/java/security/xss/. The strip is scoped to files changed on this branch since origin/main. Across 14 files this removes 395 lines of design-rationale comments, section dividers, and per-fixture javadocs. Parsed YAML structures are identical pre- and post-strip (block-scalar-aware stripper preserves `#` characters inside full-description text); Java compiles; all `@PositiveRuleSample` / `@NegativeRuleSample` annotations are preserved.

misonijnik force-pushed the misonijnik/xss-rework branch from 34b3a41 to 9be53a7 Compare May 4, 2026 15:45

misonijnik requested a review from dvvrd May 4, 2026 15:45

misonijnik force-pushed the misonijnik/xss-rework branch 6 times, most recently from 404be40 to 36cbb8d Compare May 8, 2026 00:58

misonijnik force-pushed the misonijnik/xss-rework branch from 68c3143 to 52d3e2e Compare May 12, 2026 12:35

Saloed reviewed May 12, 2026

View reviewed changes

misonijnik force-pushed the misonijnik/xss-rework branch from 9a47250 to e669444 Compare May 13, 2026 11:48

dvvrd reviewed May 14, 2026

View reviewed changes

misonijnik added 18 commits May 14, 2026 16:16

chore(core): Add passThrough rules for Spring DeferredResult

8bdcb4c

chore(core): Move DeferredResult passThrough rules to spring-web jar-…

6346a16

…split

feat: add new xss sinks rules

aa1ba47

test: add tests for xss sinks

022f071

misonijnik added 26 commits May 14, 2026 16:16

XSS rules: re-engineering

ca4cb03

XSS rules: drop issue 99 reference from servlet writer comment

a887bad

The split-arity workaround is still in place; the comment now explains the underlying name-metavar intersection problem directly instead of pointing at a numbered engine issue that is being retired.

XSS rules: refactor

c1fabd2

misonijnik force-pushed the misonijnik/xss-rework branch from cdd8e02 to fc10271 Compare May 14, 2026 14:16

misonijnik linked an issue May 14, 2026 that may be closed by this pull request

Split XSS rules by response content type — error for HTML responses, info for the rest #85

Closed

misonijnik merged commit 910476e into main May 14, 2026
5 checks passed

misonijnik deleted the misonijnik/xss-rework branch May 14, 2026 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rules): Tune XSS rules, separate ERROR and NOTE version#113

feat(rules): Tune XSS rules, separate ERROR and NOTE version#113
misonijnik merged 83 commits into
mainfrom
misonijnik/xss-rework

misonijnik commented May 4, 2026

Uh oh!

Saloed May 12, 2026

Uh oh!

Saloed May 12, 2026

Uh oh!

dvvrd May 14, 2026

Uh oh!

misonijnik May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

misonijnik commented May 4, 2026

Uh oh!

Saloed May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Saloed May 12, 2026

Choose a reason for hiding this comment

Uh oh!

dvvrd May 14, 2026

Choose a reason for hiding this comment

Uh oh!

misonijnik May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants