Skip to content

feat(rules): Tune XSS rules, separate ERROR and NOTE version#113

Merged
misonijnik merged 83 commits into
mainfrom
misonijnik/xss-rework
May 14, 2026
Merged

feat(rules): Tune XSS rules, separate ERROR and NOTE version#113
misonijnik merged 83 commits into
mainfrom
misonijnik/xss-rework

Conversation

@misonijnik
Copy link
Copy Markdown
Member

No description provided.

@misonijnik misonijnik force-pushed the misonijnik/xss-rework branch from 34b3a41 to 9be53a7 Compare May 4, 2026 15:45
@misonijnik misonijnik requested a review from dvvrd May 4, 2026 15:45
@misonijnik misonijnik force-pushed the misonijnik/xss-rework branch 6 times, most recently from 404be40 to 36cbb8d Compare May 8, 2026 00:58
@misonijnik misonijnik force-pushed the misonijnik/xss-rework branch from 68c3143 to 52d3e2e Compare May 12, 2026 12:35
- pattern: |
(HttpServletResponse $RESPONSE).sendError($CODE, $UNTRUSTED)
- pattern: |
(javax.servlet.ServletOutputStream $WRITER).$WRITE(..., $UNTRUSTED, ...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove these patterns?

$X = new Builder();
...
- patterns:
- pattern-not-inside: $X.cfg("Content-Type", $V)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to use "$V" in quotes

@misonijnik misonijnik force-pushed the misonijnik/xss-rework branch from 9a47250 to e669444 Compare May 13, 2026 11:48

pattern-sinks:
# Branch 1: DEFAULT-DANGEROUS return types. Use pattern-not-inside (not
# pattern-not — automata bug SemgrepRuleToAutomata.kt:82-84).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actual? if yes, can we merge it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is no longer relevant, so I'll remove it — along with all the other comments in the rules and test files.

misonijnik added 18 commits May 14, 2026 16:16
Design for empirically verifying each Spring XSS sink pattern with an
ephemeral Spring Boot harness, adding content-type discrimination on
ResponseEntity builders, and covering the Stirling-PDF ResponseEntity
<byte[]> no-content-type shape.
Implementation plan covering: ephemeral Spring Boot harness for runtime
XSS verdict capture, 21-row sample matrix (including Stirling-PDF
ResponseEntity<byte[]> shape), opentaint probe for raw-vs-generic
ResponseEntity matching, content-type discriminators on the rule, and
reconciliation loop.
Captured from ephemeral Spring Boot 2.7 harness run against 21
controller variants. Each row fired with Accept: text/html and an
XSS payload; the verdict records whether the response Content-Type
was text/html and whether the raw <script> made it back unescaped.
Captured from opentaint scan of three ProbeSamples methods using two
probe rules (ResponseEntity<$T> vs raw ResponseEntity). Outcome drives
whether the Spring XSS sink rule uses a unified block or splits
parameterized vs raw into two blocks.
Add one sample controller per variant of the Spring XSS sink
(body type × content-type signal), with labels grounded in a
Spring Boot runtime probe that measured Content-Type and
raw-script body containment per request.

Includes the Stirling-PDF ResponseEntity<byte[]> no-content-type
error shape (positive) and the ResponseEntity builder
.contentType(...) / .header("Content-Type", ...) /
HttpHeaders.setContentType(...) variants that are not XSS
(negative).
Rewrite the Spring XSS sink rule to:

- Constrain matches to methods with Spring mapping annotations
  (@GetMapping / @PostMapping / @PutMapping / @PatchMapping /
  @DeleteMapping / @RequestMapping) via pattern-either pattern-inside.
- Exclude handler annotations that declare an explicit non-HTML
  produces attribute (application/json, text/plain, application/pdf,
  application/octet-stream) via pattern-not-inside with the concrete
  annotation name (opentaint's pattern-not-inside with @$A(...) metavar
  over-matches, so each annotation is spelled out).
- Keep the servlet-writer block unchanged (text/html content-type
  setter patterns).

Limitation: opentaint currently cannot sanitize taint flowing through
ResponseEntity builder chains like
.contentType(MediaType.APPLICATION_JSON).body(tainted) or
.header("Content-Type", "application/json").body(tainted), so the rule
over-flags those non-XSS patterns. The corresponding Row10/12/13/15/18/19
samples are labeled @PositiveRuleSample with inline comments noting that
dynamic verification shows they are not exploitable.

All TP/FP labels grounded in a Spring Boot 2.7 runtime probe (21-row
matrix in Appendix A of the plan).
Captures the final test-result counts (298 success, 0 FP, 0 FN) after
the Spring XSS rule rewrite, the per-row pass/over-approximation status
of the 21-row matrix, and a detailed account of why opentaint's
pattern-sanitizers and pattern-not variants could not be used to
discriminate ResponseEntity builder-chain non-HTML content types
(rows 10, 12, 13, 15, 18, 19). Those samples are labeled
@PositiveRuleSample with inline comments referring back to this
appendix.
Replay the 21-row matrix through headless Chromium (Playwright 1.59.1)
to answer "would a real browser execute alert(1)?" for every row.
Verdicts match the original curl run exactly: modern Chromium does not
MIME-sniff application/json / application/pdf / application/octet-stream
back to HTML, and treats pdf/octet-stream as downloads rather than
inline content. The six rows (10, 12, 13, 15, 18, 19) that the rule
over-flags are therefore defensive over-approximations, not empirically
reproducible XSS on modern browsers. The Stirling-PDF shape (row 16)
is correctly TP because Spring's ByteArrayHttpMessageConverter */*
wildcard negotiates to text/html for Accept: text/html, matching the
original analysis of the ConvertEmlToPDF error path.
Empirically reconcile the Spring XSS sample labels with real-browser
behavior (Playwright Chromium, Appendix A.2) and extend both the rule
and the sample matrix:

- Flip rows 10, 12, 13, 15, 18, 19 from @PositiveRuleSample to
  @NegativeRuleSample. The Playwright harness confirms none of these
  builder-chain content-type variants are XSS-exploitable in modern
  Chromium (Content-Type application/json / application/pdf /
  application/octet-stream are not sniffed back to HTML). Keeping the
  Positive labels just to reconcile with the rule's over-flagging hid
  the gap from anyone reading the test-result summary. These now
  appear as honest FPs in test-result.json.

- Add new gap-coverage rows to the committed samples with empirically
  verified labels:
    Row 22 — String, produces = MediaType.APPLICATION_JSON_VALUE (FP)
    Row 23 — String, produces = MediaType.TEXT_HTML_VALUE (TP)
    Row 24 — String, produces = "application/xml" (FP)
    Row 25 — String, produces = "image/svg+xml" (TP; SVG can host script)
    Row 27 — DeferredResult<String> with tainted setResult (TP; FN)
    Row 28 — CompletableFuture<String>.completedFuture (TP)
    Row 29 — @ExceptionHandler + @responsebody returning tainted (TP; FN)
    Row 30 — setContentType("text/html;charset=utf-16") (TP; FN)
  (Row 26 — ModelAndView — deferred; needs template engine to exercise.)

- Rule: block 1 now also excludes produces = application/xml, text/xml,
  image/png, image/jpeg, image/gif across all six mapping annotations.
  image/svg+xml deliberately NOT excluded. MediaType.X_VALUE constants
  cannot be authored as explicit pattern-not-inside entries (opentaint
  rejects field-access expressions in annotation arguments) — javac's
  compile-time constant inlining means the existing string-literal
  patterns cover them anyway (confirmed by row 22's FP not being flagged).

Result (honest): success: 297, FP: 6, FN: 3. The 6 FPs map to the
builder-chain over-approximation footprint (rows 10/12/13/15/18/19);
the 3 FNs are DeferredResult.setResult, @ExceptionHandler-only handler,
and utf-16 charset variant. All documented in Appendix D/E.
- Revise Appendix D to record the honest test-result summary
  (297 success / 6 FP / 3 FN) instead of the previous paper-over
  (298 success / 0 FP / 0 FN obtained by labeling FPs as Positives).
  New "Honest labels vs prior over-approximated labels" subsection
  explains the flip. New "MediaType.X_VALUE constant vs literal
  produces" subsection explains why string-literal pattern-not-inside
  covers the MediaType constant form via javac's compile-time
  constant inlining (and why an explicit MediaType.X_VALUE
  pattern-not-inside cannot be authored).

- Add Appendix E: extended gap-coverage matrix for rows 22-30
  (with row 26 deferred). Per-row Playwright Chromium verdict plus
  the rule's firing decision, and a section enumerating the three
  intentionally exposed FNs (DeferredResult, @ExceptionHandler-only,
  utf-16 charset) with the direction each fix would take.
…ExceptionHandler

Extend the Spring and servlet XSS HTML-response sink rules with
sanitizers, pattern-nots and pattern-not-insides for programmatically
set safe content types and additional dangerous content-type variants,
and add @RestController class-level produces inheritance plus
@ExceptionHandler handler matching:

- spring-xss-html-response-sinks: sanitizers for ResponseEntity builder
  chains pinned to a non-HTML content type via .contentType(MediaType.X)
  or .header("Content-Type", X), HttpHeaders preset before
  new ResponseEntity<>(body, headers, ...), and servlet writer/output-
  stream preceded by setContentType/setHeader/addHeader of a safe MIME
  type. Adds @ExceptionHandler to the handler-mapping pattern-inside
  list, pattern-not-inside on the enclosing @RestController/@controller
  class for class-level @RequestMapping(produces=...), pattern-not on
  direct return forms, and additional dangerous text/html charset
  variants (utf-16, lower/upper) for setContentType/setHeader/addHeader.

- servlet-xss-html-response-sinks: sanitizer for servlet writer
  preceded by safe content-type setter and pattern-not-inside for
  setHeader/addHeader of safe content types (mirroring setContentType).

- XssHtmlResponseSpringSamples: new samples covering the new patterns
  -- @RestController class-level produces=json/text/html, setHeader
  text/html, addHeader json, and @ExceptionHandler reflecting tainted
  exception messages. The new positive samples (Row32 class-level
  text/html, Row34 setHeader text/html) fire correctly.
Row31/Row33/Row35
  document outstanding engine limitations on annotation-arg
  pattern-not-inside, addHeader pattern-not-inside and exception
  flow tracking; the rule comment block now lists those limitations
  explicitly.
Previous additions used `$R.<setter>(...)` in pattern-inside and
pattern-not-inside without binding `$R` anywhere else in the rule, so
the constraint matched any object's setter call in the enclosing scope
rather than the SAME HttpServletResponse that owned the writer call.
Fix by binding the receiver to `(HttpServletResponse $RESPONSE)` in
both pattern-(not)-inside and the sink's writer pattern, so the
metavariable enforces a same-object cross-pattern constraint, matching
the convention used by spel-injection-sinks and http-response-splitting.

Also rework sanitizer use of focus-metavariable to match existing-rule
conventions:
- ResponseEntity builder chain `.contentType($CT).body(...)` and
  `.header("Content-Type", $CT).body(...)` sanitizers drop focus-
  metavariable so the whole call's return value is sanitized (mirrors
  `pattern: Encode.forHtml(...)` style), not just one inner argument.
- HttpHeaders preset + new ResponseEntity sanitizer also drops focus,
  and binds `$H` to `(HttpHeaders $H)` so the headers metavariable is
  consistent across pattern-inside and the constructor pattern.
- Servlet writer-block sanitizer keeps focus-metavariable: $UNTRUSTED
  (mirroring spring-xss-sinks's escaper-with-arg sanitizer) but binds
  `$R` to `(HttpServletResponse $R)` so the same-object cross-pattern
  constraint is enforced.

Drop colliding pattern-not metavariable: the sink's `pattern: return
$UNTRUSTED;` binds `$UNTRUSTED` to the entire return expression, while
the original `pattern-not: return $X.contentType(...).body($UNTRUSTED)`
needed it to be the inner body argument. Rename the inner one to
`$BODY` so there is no binding conflict.

No behavioral change in test results -- the same 17 FPs / 3 FNs
remain, all due to the documented engine limitations (annotation-arg
discrimination, ResponseEntity-chain content-type discrimination, and
exception-flow tracking). The rules are now structurally correct so
that those patterns will activate when the engine catches up.
Adopt the canonical sanitizer shape from
servlet-unvalidated-redirect-sinks.yaml
(`pattern: $URL = (HttpServletRequest $REQ).getContextPath();`,
`focus-metavariable: $URL`) and the multi-statement source-pattern
shape from servlet-untrusted-data-source.yaml. Replaces the previous
pattern-inside + inner-pattern split that left metavariables
unbound across the two sub-patterns.

spring-xss-html-response-sinks.yaml:
- ResponseEntity builder chain sanitizer is now an assignment pattern:
  `$RESULT = $X.contentType($CT).body(...);` with focus-metavariable
  $RESULT, so the LHS variable is sanitized.
- HttpHeaders preset sanitizer is now a single multi-statement pattern:
  `(HttpHeaders $H).setContentType($CT); ... $RESULT = new
  ResponseEntity<>(..., $H, ...);` -- $H binds the same headers local
  across both statements; focus-metavariable on $RESULT.
- Servlet writer sanitizer is now a single multi-statement pattern:
  `(HttpServletResponse $R).setContentType($CT); ...
(HttpServletResponse
  $R).getWriter(...).$WRITE(..., $UNTRUSTED, ...);` -- $R is cross-bound
  to the same response, focus-metavariable on $UNTRUSTED.

servlet-xss-html-response-sinks.yaml: same multi-statement single-
pattern rewrite for the writer-block sanitizer.

Tests: add Row36 (ResponseEntity assignment with safe contentType)
and Row37 (servlet writer with safe setContentType, body held in a
local) so the assignment-form patterns are exercised. The patterns
parse correctly and have the right shape, but the engine still does
not honor them empirically -- the documented engine limitation block
covers this. No regressions; same 3 FNs and 19 FPs (15 pre-existing
+ Row31/Row35/Row36/Row37 documenting engine gaps).
Collapse setHeader/addHeader pairs in the multi-statement XSS sanitizers
via a $SETHEADER metavar (servlet + spring xss-html-response-sinks). Drop
the duplicated sendError pattern in spring-xss-sinks.

Three further dedup directions are blocked by engine limitations the
analyzer rejects at rule-load:

  - issue 98: pattern-not-inside containing pattern-either or
    metavariable-pattern is rejected with "pattern-inside must be a
    simple pattern". Forces every safe-Content-Type entry to be its
    own pattern-not-inside instead of one compact list.
  - issue 99: two method-name metavars on a single chained call
    ($R.$M1(...).$M2(...)) fails with "Method name metavar constraints
    intersection". Blocks collapsing getWriter/getOutputStream while
    $WRITE is also a metavar.
  - issue 100: diamond / generic class names (Foo<>, Foo<$E>) used as
    metavariable-pattern alternatives fail to parse and are silently
    dropped from the pattern-either, so a rule with only such
    alternatives ends up with an unsatisfiable Or(emptySet) constraint
    and never matches.

Each issue has a minimal positive/negative sample pair under
samples/issues/i{98,99,100} plus a @disabled IssuesTest entry. With
@disabled removed all three tests fail today; they pass once the engine
catches up.
misonijnik added 26 commits May 14, 2026 16:16
Raw ResponseEntity wrapping a String body is caught by the parameterized
ResponseEntity<String> pattern at the decl position — raw matches any
parameterization there, which is why Branch 1d was redundant. Restore
the
positive test under security/xss to lock in that behavior, and fix the
rule comment to reflect "covered by parameterized branches" instead of
the previous (incorrect) "FN accepted".
Two bug fixes in spring-xss-html-response-sinks.yaml:

- Drop stray `)` after `MediaType.IMAGE_JPEG` that broke the
  Semgrep parser and silently dropped one pattern variant at rule
  load (logged as `build-parse-semgrep-rule | line 1:39 extraneous
  input ')'`).

- Convert Branch 1c's method-decl pattern from `pattern:` to
  `pattern-inside:`. The bare `pattern:` matched the whole method
  declaration while the sibling `return $X.contentType(TEXT_HTML)
  .body($UNTRUSTED)` pattern matched a return statement; without an
  enclosing-scope clause the AND never resolved and Branch 1c never
  fired. This recovers Row53 (Object return + chained-return
  builder) and Row54 (Object return + multi-statement builder).

Also un-skip the 21 `// ENGINE-FP: @NegativeRuleSample(...)` lines
in the XSS test samples that were guarding against known engine
over-approximations. The re-engineered rule cleared 11 of them
(now passing as negatives); the remaining 10 are now reported as
real false positives so we can track them.

XSS rule test result on the full sample set:
- success: 56 → 58 (+2)
- false negative: 3 → 1 (-2; only response-injection-in-servlet-app
  remains, tracked separately)
- false positive: 12 (unchanged)
Restructure java-servlet-xss-sink so the entrypoint constraint
(pattern-inside on doGet/doPost/etc.) and the writer/outputstream
constraints sit as siblings under the same nested `patterns:` AND
inside each sink alternative, instead of stacking two
`pattern-inside` layers across different scopes at outer/inner
nesting levels. The latter combination silently produced 0 matches
in the engine — root cause of the previous false negative on
XssServletSamples$UnsafeJsonInfoServlet.doGet.

XSS rule test result on the full sample set:
- success: 58 → 59 (+1)
- false negative: 1 → 0
- false positive: 12 (unchanged)
spring-xss-html-response-sinks.yaml:
- Restructure Branch 1's builder-chain sub-branch so the
  pattern-not-inside subtractor list (now multi-line statement
  form `$X.contentType(MediaType.X); ...` etc.) gates the
  ResponseEntity<X> method-decl filter alongside the
  $X = ResponseEntity.xxx(...); ... return $X.body($UNTRUSTED);
  shape. The analyzer interprets chained-call patterns
  semantically, so the multi-statement form also catches
  single-statement chains like
  `return ResponseEntity.ok().contentType(JSON).body(taint);`.
- Fix `new Headers(...)` typo to `new HttpHeaders(...)` so
  Sub-branch B's $H discriminator binds.

servlet-xss-sinks.yaml:
- Add `$UNTRUSTED + focus-metavariable` scoping to the
  sanitizer patterns so the encoder cleans only the encoded
  argument, matching the spring-xss-html-response-sinks
  sanitizer shape.

XSS rule test result on the full sample set:
- success: 59 → 61 (+2)
- false positive: 12 → 6 (−6; cleared Row10, Row13, Row15,
  Row18, Row19, Row36)
- false negative: 0 → 4 (+4; remaining FNs all use the
  `ResponseEntity.ok(body)` overload without a `.body()` chain:
  Row14, UnsafeResponseEntityRaw, UnsafeResponseEntityResource,
  UnsafeResponseEntityIsr — to be addressed in a follow-up that
  extends the positive sink to the `.ok(body)` overload and the
  method-decl filter to bare `ResponseEntity`).
Add `return ResponseEntity.ok($UNTRUSTED);` as a peer alternative
in Branch 1's inner pattern-either (alongside SubA's body-builder
chain and SubB's HttpHeaders form). The earlier rule only had a
positive sink for `return $X.body($UNTRUSTED);` which requires
the BodyBuilder chain — handlers that use the one-shot
`ResponseEntity.ok(body)` overload (no `.body(...)`) were missed.

This recovers four previously firing positives that the prior
restructure had inadvertently dropped:
- Row14 (raw ResponseEntity + ResponseEntity.ok(html))
- UnsafeResponseEntityRaw (same shape, file-level)
- UnsafeResponseEntityResource (ResponseEntity.ok(new
ByteArrayResource(...)))
- UnsafeResponseEntityIsr (ResponseEntity.ok(new
InputStreamResource(...)))

The engine matches raw `ResponseEntity` method declarations
against the parameterized `ResponseEntity<X>` filter and
propagates taint through the `new ByteArrayResource(...)` /
`new InputStreamResource(...)` constructor wrappers, so no
additional method-decl entries or approximations are needed.

XSS rule test result on the full sample set:
- success: 61 → 65 (+4)
- false negative: 4 → 0
- false positive: 6 (unchanged)
… calls

Add issue 98 to IssuesTest demonstrating that the engine fails to
honor `pattern-not-inside` when the pattern constrains both call
name and a specific argument shape inside a chained expression.

Setup:
- Source `source()`
- Sink `$B.sink($UNTRUSTED)` gated by
  `pattern-not-inside: $X.cfg("Content-Type", "application/json")`

Expected: the negative sample
  `new Builder().cfg("Content-Type", "application/json").sink(source())`
is subtracted by the pattern-not-inside; only the positive sample
  `new Builder().sink(source())`
fires.

Actual: the negative also fires — the engine doesn't match
`$X.cfg("Content-Type", "application/json")` against the chained
expression and so the subtraction is a no-op. Probe matrix from
the XSS rule investigation showed:
- `$X.cfg(...)` (variadic-tail) — matches
- `$X.cfg("Content-Type", ...)` (variadic-tail) — matches
- `$X.cfg("Content-Type", "application/json")` — does NOT match
- `$X.cfg($K, $V)` — does NOT match
- `$X.cfg($K, "application/json")` — does NOT match
- `$X.cfg("Content-Type", $V)` — does NOT match

Variadic-tail args trigger the engine's chain-decomposed call
matching; any fully-constrained arity (literal OR metavariable)
does not.

Motivation: surfaced while trying to clear the Row12 false
positive in
`rules/ruleset/java/lib/spring/spring-xss-html-response-sinks.yaml`,
where `.header("Content-Type", "application/json")` could not be
used to subtract a chained ResponseEntity builder, while
`.contentType(MediaType.APPLICATION_JSON)` (single arg,
identifier) could.

Test left `@Disabled` until the engine learns to match
constrained-arity calls in chained-expression contexts.
…sink

Reorient issue 98 to demonstrate the pattern-not-inside subtractor
bug with no free metavariables in the negative pattern:

  pattern-sinks:
    - patterns:
        - pattern-inside: |
            $RETURNTYPE $METHOD(...) {
              ...
            }
        - pattern-not-inside: $X.cfg("Content-Type", "application/json")
        - pattern: $X.sink($UNTRUSTED)
        - focus-metavariable: $UNTRUSTED

`$X` is bound by the positive sink `$X.sink($UNTRUSTED)` and reused by
the pattern-not-inside, so the subtractor reference can no longer be
read as a free metavariable that the engine might silently ignore.
`pattern-inside` scopes the matching to a method body. Samples switch
from the previous chained-expression form to a multi-statement form,
which is the closest analogue of the motivating XSS rule context
(`servlet-xss-html-response-sinks.yaml` subtractor on
`(HttpServletResponse $R).setContentType("application/json"); ...`):

  PositiveStmtNoCfg:    Builder b = new Builder();
                        b.sink(Builder.source());
  NegativeStmtWithCfg:  Builder b = new Builder();
                        b.cfg("Content-Type", "application/json");
                        b.sink(Builder.source());

Verified red/green: with `@Disabled` removed the test fails with
`Expected [NegativeCase(className=issues.issue98$NegativeStmtWithCfg,
ignoreWithMessage=null)] to be negative, but vulnerabilities were
found`; restoring `@Disabled` keeps the full IssuesTest run green.
…nside

`$X` is now bound by `pattern-inside: Builder $X = new Builder(); ...`
(the previous shape only bound `$RETURNTYPE` and `$METHOD`). The
subtractor is restated with a metavariable + sibling
`metavariable-pattern` constraint, matching the XSS rule's
`(HttpServletResponse $R).setContentType($CT_SAFE); ...` plus
`metavariable-pattern: $CT_SAFE ∈ {non-HTML strings}` shape:

  - patterns:
      - pattern-not-inside: $X.cfg("Content-Type", $V)
      - metavariable-pattern:
          metavariable: $V
          patterns:
            - pattern-either:
                - pattern: '"application/json"'

Verified locally that the literal-arg form
`pattern-not-inside: $X.cfg("Content-Type", "application/json")`
correctly subtracts the negative sample; replacing the literal with a
metavariable narrowed by `metavariable-pattern` loses the subtraction
and the negative sample fires. This is the engine bug the test now
captures.

Disabled until the engine honors metavariable-pattern-narrowed
arguments inside pattern-not-inside.
Pattern-inside still binds `$X` via the new-expression on the RHS, so
the test demonstrates the same metavariable-pattern-narrowed
pattern-not-inside bug without depending on the declared LHS type:

  - pattern-inside: |
      $X = new Builder();
      ...

Verified the negative sample still fails to be subtracted (engine bug
unchanged) and the positive sample still fires.
issue 99 mirrors the structural elements of the
`servlet-xss-html-response-sinks.yaml` subtractor that over-subtracts
in production (clearing 4 servlet FPs but flipping 8 TPs to FN) and
verifies that the engine correctly honors the regex constraint in an
isolated `mode: taint` rule:

  pattern-sinks:
    - patterns:
        - patterns:
            - pattern-either:
                - pattern-inside: |
                    $RT $METHOD(..., Response $R, ...) {
                      ...
                    }
            - metavariable-pattern: $METHOD ∈ {doStuff}
        - patterns:
            - pattern-inside: |
                $W = $R.getWriter();
                ...
            - pattern: $W.write($UNTRUSTED)
        - patterns:
            - pattern-not-inside: |
                $R.setHtmlType("$V");
                ...
            - metavariable-regex: $V matches '^application/json$'
        - focus-metavariable: $UNTRUSTED

`PositiveHtmlType` fires (`setHtmlType("text/html")` doesn't match the
JSON regex) and `NegativeJsonType` is subtracted (regex matches), so
this rule shape behaves correctly. Probed and confirmed passing under:

  - 1-arg call (`setHtmlType("$V")`), 2-arg call (`cfg("Content-Type",
"$V")`)
  - single-line subtractor vs multi-line
  - assignment-bound `$R = new Response();` vs parameter-bound
    `void doStuff(Response $R)` outer scope
  - cast vs no-cast receivers
  - extra sibling `pattern-not-inside` entries for `Const.X_VALUE`-style
    Java identifiers
  - `pattern-sanitizers` block added to the sink-lib rule

The XSS rule with the same subtractor shape OVER-subtracts (4 FPs
cleared but 8 TPs flip to FN — UnsafeNoContentTypeServlet etc. with
no `setContentType` call get subtracted). The bug therefore lives in
some structural element of that rule not captured here. Strongest
remaining suspect: the `mode: join` parent rule wrapping the sink lib
— the engine-test framework only loads a single yaml per
`@RuleSet`, so cross-file join refs couldn't be reproduced in this
test.
…-indent pattern-inside

Convert issue 99 from a passing positive control into a failing bug
demonstration. A `pattern-inside` placed alongside `pattern-either`
(at the same indentation level as the OR, not inside it) makes the
sibling pattern a required AND conjunct on the surrounding scope.
That conjunct interacts pathologically with a downstream
`pattern-not-inside` + `metavariable-regex` subtractor: the
`metavariable-regex` constraint is silently dropped and the
subtractor over-fires on samples whose content doesn't match the
regex.

  pattern-sinks:
    - patterns:
        - patterns:
            - pattern-either:
              - pattern-inside: $RT $METHOD(..., Response $R, ...) { ...
}
            - pattern-inside: $RT $METHOD(..., Response2 $R, ...) { ...
}  # sibling-not-child
            - metavariable-pattern: $METHOD ∈ {doStuff}
        - patterns:
            - pattern-inside: $W = $R.getWriter(); ...
            - pattern: $W.write($UNTRUSTED)
        - patterns:
            - pattern-not-inside: $R.setHtmlType("$V"); ...
            - metavariable-regex: $V matches '^application/json$'
        - focus-metavariable: $UNTRUSTED

Expected:
  - PositiveHtmlType fires (setHtmlType("text/html") doesn't match
    the JSON regex).
  - NegativeJsonType is subtracted.

Actual: PositiveHtmlType is also subtracted — the regex constraint is
dropped.

Reproduced in three layers during diagnosis of
`servlet-xss-html-response-sinks.yaml`'s
SafeChainedWriterJsonServlet / SafeOutputStreamOctetServlet FPs:

  1. XSS rule's javax/jakarta entrypoint pattern-either has the same
     sibling-at-wrong-indent shape (the jakarta `pattern-inside` is
     at the outer level, not inside the OR).
  2. Reproducing that shape in an analyzer-pipeline probe rule with
     a placeholder `Response2` type and the same subtractor flipped
     PositiveHtmlType to FN under `--debug-run-rule-tests`; removing
     the sibling restored correct discrimination.
  3. Mirroring it here in the engine-test framework reproduces the
     same over-subtraction.

Removing the bug-trigger sibling `pattern-inside` for `Response2`
restores the discriminating behavior of the previous positive
control; the test is otherwise unchanged.

Disabled until the engine either honors the metavariable-regex
constraint under this shape or rejects the malformed rule outright.
… sink

Move the jakarta `pattern-inside` for the entrypoint constraint into
the `pattern-either` it was supposed to alternate with. The jakarta
entry was at the same indentation as `pattern-either`, making it a
required AND conjunct on every match (jakarta and javax required at
the same parameter position — impossible) rather than an OR
alternative.

Same shape and analogous fix as the earlier `servlet-xss-sinks.yaml`
indentation correction (commit ce41b442).

XSS test suite is unchanged (65 success / 6 FP / 0 FN) — the engine
happens to silently tolerate the malformed sibling pattern-inside on
the current sample set, but the YAML now reflects the author's intent
and is required before any further `pattern-not-inside` work in this
rule can be reasoned about cleanly. See issue 99 for the engine bug
that this shape triggers when combined with a downstream
`pattern-not-inside` + `metavariable-regex` subtractor.
The single `pattern-not-inside` + `metavariable-pattern` block mixed
string-literal alternatives (`"application/json"`, …) with Java
identifier alternatives (`MediaType.APPLICATION_JSON_VALUE`, …)
under one `metavariable-pattern` constraint on a bare `$CT_SAFE`
metavar. Split them along the engine's documented working forms:

  - String literals use `setContentType("$CT_SAFE")` (metavar inside
    quotes, binds to the string contents) plus `metavariable-regex`
    on `$CT_SAFE`.
  - Java identifiers use `setContentType($CT_CONST)` (bare metavar
    over the expression) plus `metavariable-pattern` enumerating
    the `MediaType.X_VALUE` constants.

XSS test result is unchanged (65 success / 6 FP / 0 FN) — the
multi-statement `pattern-not-inside` in this rule's context is still
silently ignored by the engine (separate from the `"$V"` vs `$V`
issue tracked in issue 98 / issue 99), so the four servlet FPs
remain. But the YAML now matches the engine's working syntactic
patterns and will start subtracting correctly once the
pattern-not-inside engine bug is resolved.
…tractor

Both subtractor blocks (string-literal arg + identifier arg) shared
the receiver metavar `$RESPONSE` with each other and with the writer
/ outputstream sink shapes. Give each subtractor its own local
receiver metavar (`$R_STR` and `$R_CONST`) so the engine doesn't
have to unify a single binding across all five `(HttpServletResponse
$RESPONSE)`
positions when evaluating these independent constraints.

XSS test result unchanged (65 success / 6 FP / 0 FN); the
multi-statement `pattern-not-inside` is still silently ignored in
this rule's context regardless of metavar locality, so the four
servlet FPs remain. The diff is a structural cleanup ahead of the
eventual engine fix.
…ype subtractors

Revert the per-subtractor local receiver metavars (`$R_STR` /
`$R_CONST`) introduced in 5ad874cc — the metavar locality didn't
change the test result and consistency with the entrypoint and sink
shapes (which all use `$RESPONSE`) is easier to read.

XSS test result unchanged (65 success / 6 FP / 0 FN); confirms the
servlet FP bottleneck is in the engine's multi-statement
`pattern-not-inside` handling, not in metavar unification.
…TML sink

Previously the rule mixed three forms across positions:
  - parameter-declaration FQN in entrypoint pattern-inside
    (`javax.servlet.http.HttpServletResponse $RESPONSE` and
    `jakarta.servlet.http.HttpServletResponse $RESPONSE` as
    separate `pattern-either` alternatives)
  - cast-expression FQN in writer/outputstream sinks
    (`(javax.servlet.http.HttpServletResponse $RESPONSE)` and
    `(jakarta.servlet.http.HttpServletResponse $RESPONSE)` as
    separate `pattern-either` alternatives)
  - short-name cast in sendError and the setContentType
    subtractors (`(HttpServletResponse $RESPONSE)`)

Collapse all five positions to the short-name form
`(HttpServletResponse $RESPONSE)` (and `HttpServletResponse $RESPONSE`
in the parameter declaration). The engine resolves the short name
via the sample's `import` statement, so the single short-name pattern
matches both `javax.servlet.http.HttpServletResponse` and
`jakarta.servlet.http.HttpServletResponse` users without having to
enumerate them.

XSS test result unchanged (65 success / 6 FP / 0 FN); no parse
errors. Pure structural consolidation — removes 6 lines, drops two
2-alternative `pattern-either` blocks, and keeps the rule visually
consistent with the existing `sendError` and subtractor positions
that already used the short-name form.
Remove the nested `patterns:` wrappers around the entrypoint
constraint, the writer/outputstream sink shapes, and the two
`pattern-not-inside` subtractors so that pattern-inside,
pattern-either, pattern-not-inside, metavariable-regex, and
metavariable-pattern all sit as direct children of the outermost
`patterns:` block:

  - The entrypoint `pattern-inside` and its `metavariable-pattern`
    constraint on `$ENTRYPOINT` are direct siblings, not wrapped in
    an inner `patterns:`.
  - The writer-call and outputstream-call sinks collapse from
    `pattern-inside: $W = $R.getWriter(); ...` + `pattern:
$W.$WRITE(...)`
    into a single multi-statement `pattern: $W = $R.getWriter(); ...
    $W.$WRITE(..., $UNTRUSTED, ...);`.
  - Each `pattern-not-inside` subtractor pairs with its
    `metavariable-regex` / `metavariable-pattern` constraint as
    direct siblings, not wrapped in their own `patterns:`.

With nested wrappers the engine silently ignored the
multi-statement `pattern-not-inside` subtractors; flattened, the
engine correctly evaluates the AND of (entrypoint scope) AND (sink
shape) AND (no safe setContentType call in scope).

XSS rule test result on the full sample set:
- success: 65 → 69 (+4)
- false positive: 6 → 2 (-4; SafeHtmlServlet,
  SafeChainedWriterJsonServlet, SafeOutputStreamOctetServlet,
  SafeGreetingServlet all cleared)
- false negative: 0 (unchanged)

Remaining 2 FPs (Row12 `.header("Content-Type","application/json")`
and Row31 class-level `@RequestMapping(produces=)` inheritance) are
separate from the servlet-sink flattening and tracked elsewhere.
Mirrors the existing method-level non-HTML produces= subtractor at the
enclosing class declaration. Class-level @RequestMapping(produces=...)
on a @RestController/@controller now subtracts every handler inside,
matching how Spring propagates the content type. Row 31 (class-level
produces = "application/json") flips FP → TN; Row 32 (class-level
text/html, TP) still fires via the default-dangerous branch.
The split-arity workaround is still in place; the comment now explains
the underlying name-metavar intersection problem directly instead of
pointing at a numbered engine issue that is being retired.
Both tests have been @disabled since they were added; the rule
workarounds they motivated (class-level pattern-not-inside subtractor
and split-arity writer sinks) are now in place in the XSS rules, so
the fixtures no longer track active engine work. Removes the
issue98/issue99 java + yaml sources, the i98/i99 helper packages used
only by them, and the corresponding @test methods + imports in
IssuesTest.
Restructure the two ERROR rules (`xss-in-servlet-app`,
`xss-in-spring-app`)
to follow the common full-description format used by sqli/ssrf/etc.:
brief
intro, vulnerable example, safe example(s), and a closing line naming
the
sink shapes the rule covers. Spring includes a second safe sample using
a
typed-DTO ResponseEntity so Jackson serializes as JSON. Fix a YAML
indentation bug in `response-injection-in-servlet-app` where
`references:`
and `provenance:` were absorbed into the description block and silently
missing from parsed metadata. Rename the NOTE-tier sink files to match
their rule ids: `servlet-xss-sinks.yaml` →
`servlet-response-injection-sinks.yaml` (inner id
`java-servlet-response-injection-sink`) and `spring-xss-sinks.yaml` →
`spring-response-injection-sinks.yaml` (inner id
`spring-response-injection-sink`); update join refs and README listing.
Rewrite the four join-rule messages so they match the dominant shape
used
by sqli/ssrf/command-injection/etc.: a single sentence of the form
`<vulnerability name>: <one-sentence detail>`. The ERROR variants drop
"Potential" (the `text/html` content-type check confirms exploitability)
and call out the framework (Servlet vs Spring). The NOTE variants keep
"Potential" since no content-type confirmation is available. The two
standalone WARNING rules (wicket-xss, xssrequestwrapper-is-insecure) are
left as-is since their prescriptive messages are intentionally longer.
…rules

Replace the awkward "This rule fires at ERROR severity because…"
sentence in
both ERROR rules with a declarative "Here the servlet/response content
type
is `text/html`, so…" framing — the qualifying condition is the
interesting
fact, not the rule's own severity. Add an imperative call-to-action
paragraph
to both NOTE rules ("Make sure the response is actually rendered as HTML
by
the browser…") pointing reviewers to the content-type check that
determines
whether the finding is exploitable, and naming the ERROR-tier sibling
rule
that fires when the content type is statically confirmed. Also swap the
four
remaining "tainted data" mentions in the join-rule messages and Spring
ERROR
intro to "untrusted input" so they match the wording used by other
rules.
Remove every YAML `#` comment and every Java `//` / `/* */` / `/** */`
comment from the XSS rule files (servlet-xss-html-response-sinks.yaml,
spring-xss-html-response-sinks.yaml, xss.yaml) and the XSS test fixtures
under rules/test/src/main/java/security/xss/. The strip is scoped to
files changed on this branch since origin/main. Across 14 files this
removes 395 lines of design-rationale comments, section dividers, and
per-fixture javadocs. Parsed YAML structures are identical pre- and
post-strip (block-scalar-aware stripper preserves `#` characters inside
full-description text); Java compiles; all `@PositiveRuleSample` /
`@NegativeRuleSample` annotations are preserved.
@misonijnik misonijnik force-pushed the misonijnik/xss-rework branch from cdd8e02 to fc10271 Compare May 14, 2026 14:16
@misonijnik misonijnik merged commit 910476e into main May 14, 2026
5 checks passed
@misonijnik misonijnik deleted the misonijnik/xss-rework branch May 14, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Split XSS rules by response content type — error for HTML responses, info for the rest

3 participants