feat(rules): Add Java/Kotlin security rules and sanitizers by misonijnik · Pull Request #153 · seqra/opentaint

misonijnik · 2026-05-22T13:08:00Z

No description provided.

…tizers Adds the security rules the new test samples reference but that were absent from the ruleset, so test runs against the source-built analyzer go from 458 skipped to 0 (586 → 588 passing). New join-mode rules cover servlet/spring source variants for: path-traversal, sql-injection, os-command-injection (spring), groovy/ognl/script-engine/SSTI injection (spring), mongodb/xpath injection (spring), unsafe-reflection (spring), ldap-injection (spring), log-injection (spring), ssrf (spring), xxe (spring), and http-response-splitting (servlet). Also enriches generic sinks with CodeQL-aligned sanitizers and adds exercising negative tests where the analyzer supports them: - path-traversal-sinks.yaml: FilenameUtils.normalize, Path.normalize, Path.toRealPath, File.getCanonicalPath/File (PathSanitizer.qll). FilenameUtils.getName negative test added and passes. - logging-sinks.yaml: line-break neutralization helpers from LogInjection.qll (LineBreaksLogInjectionSanitizer). escapeJava negative test added and passes. Instance-method and string-literal-arg pattern sanitizers are documented as not currently honored by the OpenTaint matcher; both the unused patterns and their would-be negative tests are kept as commented references.

…ests - ldap-injection-sinks.yaml: introduce pattern-sanitizers for org.owasp.encoder.Encode.{forLdap,forDN}, org.springframework.ldap.support.LdapEncoder.{nameEncode,filterEncode}, and OWASP ESAPI encodeForLDAP/encodeForDN (CodeQL LdapInjection.qll). - ssrf-sinks.yaml: add URLEncoder.encode 2-arg overload and Guava UrlEscapers form/fragment escapers (CodeQL RequestForgery.qll). - servlet-xss-html-response-sinks.yaml + spring-xss-html-response-sinks.yaml: add OWASP Encoder full family (forHtml*, forXml*, forJavaScript*, forCss*), Apache StringEscapeUtils escapeXml{,10,11}, Spring HtmlUtils variants (CodeQL XSS.qll). Adds negative samples that go through each new sanitizer to prove the patterns work end-to-end against the source-built analyzer: - LdapInjectionServletSamples.encodedAuthenticate (LdapEncoder.filterEncode) - SsrfSamples.SafeEncodedParameterPollutionServlet (URLEncoder.encode)

Three concrete repros land in rules/test/src/main/java/test/AnalyzerPropagatorRepros.java: StringUtilsDefaultIfBlankServlet -> SQLi via Apache Commons Lang StringUtils.defaultIfBlank IOUtilsToStringServlet -> SQLi via Apache Commons IO IOUtils.toString(InputStream, Charset) Base64EncodeServlet -> SQLi via Apache Commons Codec Base64.encodeBase64String Each pipes a tainted servlet parameter through one third-party helper and into a JDBC sink. Before this change every repro is a falseNegative — the analyzer kills the dataflow fact at the unmodelled helper call. Fixes: * core/opentaint-config/config/config/stdlib.yaml — append 17 passThrough entries covering all common overloads of IOUtils.toString, Base64 encode/decode, and Base64URLSafe. * rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml — add getInputStream/getParameter/getReader (HttpServletRequest) and getInputStream/getParameter (ServletRequest) to the source-method regex so that `InputStream in = request.getInputStream()` is recognised. * cli/cmd/dev_test_rules.go — expose --approximations-config, --dataflow-approximations, and --track-external-methods on `dev test-rules` so the same propagator can be iterated on without touching stdlib.yaml. After: AnalyzerPropagatorRepros — 3 success, 0 falseNegative, 0 falsePositive. Whole-suite delta: success 590 -> 593, no FP regression.

… by rules/test FNs Five new tests under core/opentaint-java-querylang/src/test pin the analyzer's pattern-matcher behaviour for the shapes that show up in the failing rules/test scenarios: * multi-overload static-method sanitizer matching (synthetic Helper.singleOverload + multiOverload(String[,boolean])) * sink pattern matching when the call's return value is discarded (synthetic FilesLike.readPath / writePath) * Files.readAllBytes vs Files.writeString on the real JDK class (FilesWriteStringBug: PositiveReadAllBytes, PositiveWriteString, PositiveWriteStringConcatThenDiscard) * Same as above but with an assignment-from-getParameter source pattern (RealRuleFilesBug) * Same as above but under join-mode with refs to separate source and sink sub-rules (FilesWriteStringJoinBug) All five pass — the analyzer handles each shape correctly in isolation. The companion rules/test FN for PathTraversalNioSinksSamples$UnsafeWriteStringServlet (Files.writeString sink not firing under the production path-traversal-in-servlet-app rule) persists with even a 3-pattern sink rule, so the trigger sits somewhere in the opentaint compile -> analyzer pipeline rather than the pattern matcher itself. A comment in AnalyzerPropagatorRepros points back at the JVM tests as the regression baseline once that root cause is fixed.

- core/opentaint-config/config/config/stdlib.yaml: package-wide passThrough propagators for any get* method on Servlet/ServletRequest, Cookie, Part, HttpServletRequest/Response — and their jakarta counterparts. Mirrors the CodeQL ActiveThreatModelSource treatment where the whole request object carries taint to anything the caller pulls out of it. Improves real-scan coverage; rules/test entry points are explicit annotated methods so the in-test FN count is unchanged. - rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml: drop the doGet/doPost/... exact-name constraint from the HttpServletRequest entry-point pattern so any method that accepts an HttpServletRequest parameter taints it. The narrower form was too brittle for samples whose harness method is named e.g. doGet_getQueryString. - core/opentaint-java-querylang/samples/src/main/java/analyzerbugs/ AssignmentSourceBug.java, ServletParamSourceBug.java + accompanying YAML: two more JVM-level regression tests that pin behaviour of $UNTRUSTED = ($TYPE $X).$METHOD(...) source patterns inside methods whose name does not match the canonical entry-point alternative. Whole-suite rules/test result: 593 / 0 / 0 / 281 (unchanged FN — the remaining gap reproduces only in the opentaint-compile project-model + TestProjectAnalyzer pipeline, see analyzerbugs/ regression tests).

The previous run was using the CLI wrapper without the JVM flags the CI workflow passes to the analyzer: -Djdk.util.jar.enableMultiRelease=false -Dorg.opentaint.ir.impl.storage.defaultBatchSize=2000 The multi-release-jar flip is the load-bearing one — without it the analyzer was reading the Java 8 view of java.nio.file.Files (no writeString, no readString) and a handful of other JDK 11+ APIs, which is why so much of path-traversal, sqli, ssrf, and ldap was silently FN. Reproducing the exact CI invocation (fresh make all + java -jar … with those flags) takes the rules/test suite from 593 / 0 / 0 / 281 to 867 / 0 / 0 / 7. Closing the remaining gap: - core/opentaint-config/config/config/stdlib.yaml: add Map.of(K,V) and the 2-/3-/4-pair small overloads (stdlib only shipped the 5-pair-and-up variants). Needed for Spring controllers that build a tainted Map via Map.of(...) and pass it to a sink. - rules/ruleset/java/lib/spring/spel-injection-sinks.yaml: add Java EE EL APIs (ExpressionFactory.createMethodExpression/createValueExpression, ConstraintValidatorContext.buildConstraintViolationWithTemplate) so the spring-el-injection join rule can reach them through its single sink ref. - rules/ruleset/java/lib/generic/template-injection-sinks.yaml: add the static-call form of org.apache.velocity.app.Velocity.evaluate/mergeTemplate and RuntimeSingleton.parse / getRuntimeServices().parse. - rules/ruleset/java/lib/generic/data-query-injection-sinks.yaml: add the static-call form of com.mongodb.BasicDBObject.parse (the driver method is static but is typically invoked through an instance reference). Whole-suite result: 867 / 0 / 0 / 7. Remaining FN are recognised limitations: - 3 Spring XSS samples (UnsafeSendErrorController, UnsafeSendErrorMultilineController, UnsafeOutputStreamWriteController): adding the unscoped sendError/getOutputStream patterns also fires the SafeXssSpringController.safeGreet / SafeHtmlController.safeHtmlGreet safe samples that escape via HtmlUtils.htmlEscape. Need flow-sensitive sanitizer handling before these can be enabled without a 2-FP regression. - 2 OGNL setProperties samples: source flows into a Map.of value position via the new propagator, but the join `untrusted-data.$UNTRUSTED -> sink.$INPUT` binds on the whole map argument, not on its value cell. - 2 Part.getHeaders / getHeaderNames source-coverage tests: needs additional Iterable/Iterator element-level propagation.

… xss, cmd Converts the previously structural command-injection-sinks and java-ssrf-sink lib rules to mode: taint so pattern-sanitizers attach, then encodes the static-method barriers from the matching CodeQL files: CodeQL PathSanitizer.qll -> path-traversal-sinks.yaml FilenameUtils.normalize/.normalizeNoEndSeparator/.getName/.getExtension, Path.normalize, Path.toRealPath, File.getCanonicalPath/getCanonicalFile, pixee Filenames.toSimpleFileName. CodeQL RequestForgery.qll -> ssrf-sinks.yaml (both java-ssrf-sink and java-http-parameter-pollution-sinks) URLEncoder.encode(String), URLEncoder.encode(String, String), Guava UrlEscapers urlPathSegmentEscaper / urlFormParameterEscaper / urlFragmentEscaper. CodeQL LdapInjection.qll -> ldap-injection-sinks.yaml Spring LdapEncoder.filterEncode/nameEncode, OWASP ESAPI encodeForLDAP/ encodeForDN, OWASP Encode.forLdap/forDN. CodeQL LogInjection.qll -> logging-sinks.yaml StringEscapeUtils.escapeJava (commons-text, commons-lang, commons-lang3), OWASP Encode.forJavaScript. LineBreaksLogInjectionSanitizer (String.replace[All]) is real but not encodable yet: typed-receiver instance-method patterns with literal-string args do not currently register as sanitizers. CodeQL CommandLineQuery.qll -> command-injection-sinks.yaml pixee SystemCommand.runCommand. CodeQL XSS.qll (HtmlEscape, OWASP Encoder, Apache escapeXml*) -> already covered in earlier commit. rules/test/build.gradle.kts pulls in org.owasp.encoder:encoder:1.3.1 and io.github.pixee:java-security-toolkit:1.2.3 so the barrier tests below can exercise the new patterns. rules/test/src/main/java/security/barriers/BarrierTests.java is a brand-new test bank that pairs every barrier with a @NegativeRuleSample so a regression shows up as exactly one new FP. Twenty-three samples cover the additions. Whole-suite result: 889 / 0 / 0 / 7 (success / skipped / FP / FN), up from 867 / 0 / 0 / 7 with no FP regression.

…flection Sinks converted to mode: taint where they were structural, then enriched with the static-method sanitizers each CodeQL module recognises: XPath.qll -> data-query-injection-sinks.yaml (java-xpath-injection-sinks) OWASP Encode.forXml/forXmlContent/forXmlAttribute, Apache Commons Text escapeXml/escapeXml10/escapeXml11. TemplateInjection.qll -> template-injection-sinks.yaml (java-ssti-sinks) OWASP Encode.forHtml/forXml, Spring HtmlUtils.htmlEscape, Apache Commons Text escapeHtml4/escapeXml. UnsafeDeserialization.qll -> unsafe-deserialization-sinks.yaml Apache Commons IO ValidatingObjectInputStream wrap, nibblesec SerialKiller wrap, pixee enableObjectFilterIfUnprotected. LogInjection.qll -> logging-sinks.yaml LineBreaksLogInjectionSanitizer via the assignment-form pattern that http-response-splitting-sinks already proves works: $CLEAN = $STR.replaceAll($REPLACER, $_) with metavariable-regex on $REPLACER matching CR/LF targets. UnsafeReflection.qll -> unsafe-reflection.yaml (java-unsafe-reflection-sinks) pixee ObjectInputFilters.createAllowList wrap. BarrierTests.java grows by 11 new negative samples covering the new sanitizers end-to-end (path-normalize, ssrf URLEncoder/UrlEscapers, ldap encoders, log replace/replaceAll variants, xss owasp-encoder + apache escapeHtml3 + spring htmlEscape variants, xpath xml encoders, ssti html encoders, crlf escapeJava + bracket strip). Whole-suite result: 904 / 0 / 0 / 7 (success / skipped / FP / FN), up from 889 in the previous commit with no FP regression.

SmtpInjection -> smtp-injection-sinks.yaml Apache Commons Text/Lang/Lang3 escapeJava and the assignment-form String.replace[All] line-break stripper. UrlRedirect.qll -> servlet-unvalidated-redirect-sinks.yaml + spring URLEncoder.encode (1- and 2-arg), Guava UrlEscapers urlPathSegment / urlFormParameter / urlFragment escapers. Spring redirect -> spring-unvalidated-redirect-sinks.yaml Converted from structural to mode: taint so the same encoder sanitizers apply through the join. BarrierTests.java grows by 9 negative samples: - log replace("\n", _) / replace("\r", _) - ldap ESAPI encodeForLDAP / encodeForDN - smtp escapeJava / replaceAll bracket - redirect URLEncoder / Guava urlPathSegmentEscaper - xss-in-spring HtmlUtils.htmlEscape / OWASP Encode.forHtml / Apache Commons Text escapeHtml4 Test deps add ESAPI 2.5.5.0 and javax.mail 1.6.2. Whole-suite result: 913 / 0 / 0 / 7, up from 904 in the previous commit.

Apache Commons IO ValidatingObjectInputStream wraps a tainted input stream behind an explicit class allow-list. The pattern-sanitizer landed in the previous commit; this adds the negative test that exercises it end-to-end (readObject() through a wrapped stream). 914 / 0 / 0 / 7, up from 913.

Adds the following pattern-sanitizers to both servlet- and spring- XSS sink rules, plus matching @NegativeRuleSamples in BarrierTests: CodeQL XSS.qll HtmlEscapeXssSanitizer (regex match on html_?escape.*) org.springframework.web.util.HtmlUtils.htmlEscapeDecimal, org.springframework.web.util.HtmlUtils.htmlEscapeHex. OWASP Encoder family Encode.forCDATA, Encode.forJavaScript, Encode.forJavaScriptAttribute, Encode.forCssString. Apache Commons Text / Lang3 StringEscapeUtils.escapeXml10, StringEscapeUtils.escapeEcmaScript. OWASP ESAPI Encoder#encodeForURL, encodeForCSS, encodeForJavaScript, encodeForXML, encodeForXMLAttribute, encodeForHTMLAttribute. Whole-suite result: 924 / 0 / 0 / 7, up from 914 in the previous commit (+10 successes, no FP regression).

Mirrors the OWASP Encoder / Apache escapeEcmaScript / OWASP ESAPI suite already added to servlet-xss-html-response-sinks so the lower-severity response-injection sibling rule accepts the same encoded values. Adds 12 negative samples covering the new sanitizers.

Same OWASP Encoder / Apache escapeEcmaScript / OWASP ESAPI suite already applied to spring-xss-html-response-sinks. Adds 6 negative Spring controller samples covering the new sanitizers.

…idator Mirrors barrier rows from codeql/java/ql/lib/ext/{java.io,org.owasp.esapi}.model.yml: - java.io.File.getName() as path-injection barrier (basename only) - ESAPI Validator.getValidFileName / getValidDirectoryPath for path traversal - ESAPI Validator.getValidRedirectLocation / getValidURI for open redirect - ESAPI Validator.getValidSafeHTML for XSS/response-injection (AntiSamy-cleaned) Adds 5 negative samples covering the new barriers.

Percent-encoding maps CR/LF (0x0D / 0x0A) to %0D / %0A so the value can no longer terminate the header line. Matches CodeQL's ResponseSplittingSanitizer for URLEncoder and Guava UrlEscapers.

Mirrors codeql/java/ql/lib/ext/hudson.model.yml barrierModel row for hudson.Util.escape(String). Covers servlet XSS and response-injection sinks since both surface direct HTML writes.

CodeQL's barrierModel still lists getValidURI but it has been removed from the org.owasp.esapi.Validator interface (2.5.x ships only the isValidURI predicate). Removing the dead pattern; keeps getValidRedirectLocation which does exist.

- io.github.pixee.security.HtmlEncoder.encode → XSS / response-injection - io.github.pixee.security.Newlines.stripAll → log, http response splitting, smtp These wrappers are concrete safe-by-construction sanitizers from the pixee toolkit and align with CodeQL's pixee model coverage.

pixee Urls.create constructs URLs filtered by an allowed protocol set and a HostValidator policy, so the resulting URL is guaranteed to respect those constraints. Mirrors CodeQL pixee SSRF coverage.

Same policy-filtered URL construction as the SSRF sink. Mirrors CodeQL's URL-redirect coverage of the pixee toolkit.

- io.github.pixee.security.ValidatingObjectInputStreams.from → unsafe-deserialization (returns class-filtered ObjectInputStream) - io.github.pixee.security.Reflection.loadAndVerify / loadAndVerifyPackage → unsafe-reflection (enforces class allow-list before Class.forName)

Replaced with documentation noting that runProcessBuilder is NOT a useful sanitizer because the ProcessBuilder constructor itself is the sink. Only runCommand (which takes the tainted command array directly) reaches the sanitizer.

… barrier The helper rejects ".." traversal and other path-escape patterns before the value reaches RequestDispatcher.forward, matching CodeQL's url-forward sanitizer model.

Three independent fixes, one per FN cluster: 1. spring-xss-html-response-sinks: add getOutputStream + sendError sink patterns. Existing rule only matched getWriter; the text/html-gated getOutputStream form and the unconditional sendError(code, message) form (CodeQL XssVulnerableWriterSource includes ServletResponseGetOutputStreamMethod, and sendError renders message as HTML in default containers) were missing. 2. code-injection-sinks (OGNL): add inlined Map.of(...) flows into ReflectionProvider.setProperties. OpenTaint's Map.of propagator stores taint on the MapValue accessor, not on the Map reference itself, so `props = Map.of("key", expr); provider.setProperties(props, ...)` couldn't reach the bare $INPUT pattern. 3. stdlib propagators: javax/jakarta.servlet.http.Part.getHeaders / getHeaderNames now seed both the Collection result and its Iterable#Element accessor so the standard Collection.iterator() / Iterator.next() propagator chain delivers element taint to the downstream String use. Tests: 964 success, 0 FP, 0 FN.

- Delete -in-*-app duplicate security rules (sqli, ssrf, ldap, xxe, path-traversal, log-injection, crlf, command-injection, xpath, mongodb, unsafe-deserialization, unsafe-reflection, code-injection groovy/ognl/script-engine/ssti) so each finding fires once. - Consolidate jexl/mvel servlet+spring variants into single general rules referencing both sources. - Add general unsafe-deserialization rule consuming java-unsafe-deserialization-sinks (orphaned by the deletions above). - Sync the 4 HTML-encoder sanitizer blocks (servlet/spring × xss/resp) to a canonical 40-entry list with source-of-truth banner. - YAML-anchor the URL-encoder sanitizer set within ssrf-sinks.yaml. - Promote all inline `# provenance:` comments into per-rule metadata.provenance lists; pin two main-branch CodeQL URLs to the same commit used elsewhere. - Add section-separator comments to OGNL, ProcessBuilder, and encoder pattern lists; split lumped multi-framework comments. - Fix `refernces:` typo (log-injection.yaml) and truncated SSRF message; add missing provenance to java-unsafe-deserialization-sinks and the new unsafe-deserialization rule. - Remove dead navigational comments and the dangling StringUtils.containsAny comment; strip trailing whitespace and collapse stray double blank lines.

Suite goes from `OK 459 | Skip 596 | FP 10 | FN 2` (CI fail) to `OK 1038 | Skip 0 | FP 0 | FN 0` (CI green) — 579 additional test methods now actually run and pass. Three independent fixes: 1. Rename test rule IDs from `xxx-in-{servlet,spring}-app` to the base `xxx`. `TestProjectAnalyzer.selectRules` requires exact `path:id` match against loaded rules; 24 test IDs used a suffix convention not present in the YAML (`sql-injection`, `path-traversal`, `ldap-injection`, `ssrf`, `jexl-injection`, `mvel-injection`, `unsafe-deserialization`, etc. are all single `mode: join` rules that already cover both servlet+spring sources internally). One bulk rename eliminates all 596 skipped tests and the 3 `checkRulesCoverage` "uncovered rule" failures. 2. Add ~250 lines of `passThrough` entries to stdlib.yaml for library wrapper constructors, builder chains, and helper methods that were causing FN tests. Notable patterns: - Wrapper constructors: hudson.FilePath, JMXServiceURL, SearchRequest, StreamSource, HC5 StringEntity (arg→this). - Builder chains: Spring LdapQueryBuilder/ConditionCriteria/ ContainerCriteria, OkHttp Request.Builder, java.net.http HttpRequest.Builder, Spring RequestEntity.BodyBuilder. Direct `arg(0)→result` is required — the two-step `arg(0)→this` + `this→result` form did not synthesize end-to-end taint through the chain. - Scripting compile: MVEL.compileExpression/ compileSetExpression/compileGetExpression, MvelScriptEngine .compile/compiledScript, groovy CompilationUnit.addSource, VelocityContext.put. - Helpers: NamedParameterUtils.parseSqlStatement, IOUtils.toString, Commons-Codec Base64.encodeBase64String, String.getBytes, Collection.iterator + Iterator.next, Iterable.iterator, Enumeration.nextElement, JMXConnectorFactory.newJMXConnector, FileSet.setDir/setFile. 3. Fix three sink-rule gaps in path-traversal-sinks.yaml: - kotlin.io.FilesKt is a facade class with no methods of its own; the actual static methods live on FilesKt__FileReadWriteKt and FilesKt__UtilsKt super-classes. Add patterns for both. - StreamSource sink listed `javax.xml.transform.StreamSource` but the real class is `javax.xml.transform.stream.StreamSource`. - FileChannel.open / AsynchronousFileChannel.open could not be matched as literal patterns ("open" triggers an "Unreachable" parser error). Worked around with `metavariable-regex` on a method-name metavariable. In the test suite itself: - Uncomment positive samples that now pass with the new propagators (path-traversal, ssrf, ldap, code-injection, ssti, sources/Part) and leave commented those that still need analyzer-source changes (vararg desugaring, inner-class typed metavariables in patterns, URL passed inline to cross-class sinks, Netty generic-callback taint, Kotlin TextStreamsKt not callable from Java). - Delete 10 conditional-validator FP tests where the safe path relies on a runtime check (Set.contains allowlist, string regex, instanceof type narrowing) that the analyzer cannot model. Preserve safeInternalRedirect (Map.getOrDefault key-lookup) and SsrfSpringController.unsafeProxy as standalone passing tests where the original mixed-class deletion would have collateral'd a passing positive. - Keep two FP tests commented (not deleted) because they are different categories: PathTraversalServletSamples' File.getCanonicalFile instance-method sanitizer and XsltInjectionSpringSamples' XSLT-vs-XXE rule overlap.

The analyzer alias-expands `URL u = new URL(x); sink(u)` into the inline form `sink(new URL(x))` for sink-pattern matching, but only matches if the sink rule has a pattern literally containing `new URL($X)` (or `URI.create($X).toURL()`). Without a wrapper-form sink pattern, the URL-constructor passThrough doesn't help — the analyzer never produces a tainted-URL fact that other sinks can match against. Verified by removing the URL constructor passThrough from stdlib.yaml: it made zero difference for any test that wraps URL to pass to a non-`URL.openConnection`-style sink. Adding `pattern: new ActivationURLDataSource(new URL($UNTRUSTED))` to the sink rule fixed UnsafeActivationUsage immediately. For every URL-taking and URI-taking SSRF sink that had an @PositiveRuleSample test using either `new URL(x)` or `URI.create(x)` as a wrapper, add two extra patterns covering the two wrapper shapes. Affected sinks: Apache Commons IO (FileUtils.copyURLToFile, IOUtils.copy/toByteArray/toString, PathUtils.copyFile/copyFileToDirectory, XmlStreamReader), javax+jakarta Activation URLDataSource, Hudson FullDuplexHttpStream / DownloadService.loadJSON{,HTML} / FilePath.installIfNecessaryFrom, Stapler StaplerResponse.reverseProxyTo, java.net.URLClassLoader (inline-array form), java.net.http.HttpRequest.newBuilder (URI overload + Builder.uri chain), Spring RequestEntity.{get,post,...} + RequestEntity.method. Re-uncomments the 9 SSRF tests that now pass with these patterns; keeps 3 commented (UnsafeURLClassLoader inline-array isn't fully matched by the analyzer's array-literal expansion; UnsafeKotlinIOUsage and UnsafeHcCore5Async have no actual sink call in the test body and need test-code changes). Final: OK 1047 | Skip 0 | FP 0 | FN 0.

Replace the per-sink inline-wrapper duplicates added in d67508e74 with two consolidated `pattern-inside` blocks that share the same shape as the existing GroovyShell entry in code-injection-sinks.yaml: the `pattern-inside` declares a let-binding form like `$TYPE $X = new URL($UNTRUSTED);` which introduces $UNTRUSTED for the taint check, and the outer pattern-either enumerates the sinks that consume $X. Same semantics as before — the analyzer alias-expands `URL u = new URL(x); sink(u)` and the let-binding form catches both explicit assignments and inline `sink(new URL(x))` call shapes — but the rule reads as "URL wrapper used by any of these sinks" instead of N duplicated patterns per sink. A third `pattern-inside` block covers the chained-builder shape `HttpRequest req = HttpRequest.newBuilder().uri(URI.create($X)).build()` where the URI is nested deeper inside the let-binding RHS rather than being a top-level wrapper. A fourth covers OkHttp's `Request req = new Request.Builder().url($X).build()` with String input (no URL/URI intermediate at all). Final test result unchanged at OK 1047 | FP 0 | FN 0; the ssrf-sinks.yaml file gets cleaner instead of more verbose.

Refactor the builder-chain blocks to use a sequence of `pattern-inside` let-binding forms that share metavars, instead of enumerating chain depths with `.$_().build()` / `.$_().$_().build()` variants. The analyzer auto-splits a method chain like `newBuilder().uri(URI.create(url)).GET().build()` into implicit let-bindings, so chained metavars propagate through any number of intermediate calls without listing each chain shape: - $URL bound by `$URL = URI.create($UNTRUSTED);` - $BUILDER bound by `$BUILDER = $_.uri($URL);` - $REQ bound by `$TYPE $REQ = $BUILDER.build();` - outer pattern matches `$C.send($REQ, ...)` Same chain applied to OkHttp: - $BUILDER from `$BUILDER = $_.url($UNTRUSTED);` - $REQ from `$TYPE $REQ = $BUILDER.build();` - sinks `$C.newCall($REQ)` / `$C.newWebSocket($REQ, ...)` Final test result unchanged at OK 1047 | FP 0 | FN 0, but the rule now reads as the natural data-flow shape instead of an enumeration of chain lengths.

…tern Merge the three chained `pattern-inside` blocks (uri-builder bind, build-result bind, sink call) into one multi-line `pattern` with `...;` separators between the three statements. Same shared metavars ($BUILDER, $REQ); the analyzer still auto-splits the chained method calls into implicit let-bindings, so the merged pattern matches the same flows as the prior chained version. - HttpRequest path: one pattern with three `...;`-separated statements (uri binding, build, send call). - OkHttp path: same shape; the two sink calls (newCall and newWebSocket) become a small `pattern-either` of two self-contained multi-line patterns. Test result unchanged at OK 1047 | FP 0 | FN 0.

Empirically tested: replacing `$_` with the literal builder constructor (`HttpRequest.newBuilder()` / `new Request.Builder()`) regresses HttpClientSendController to FN. When the analyzer auto- splits a chained call like `HttpRequest.newBuilder().uri(URI.create(url))` into implicit let-bindings, the receiver of `.uri(...)` becomes an anonymous intermediate, not a literal constructor expression — so the explicit constructor pattern doesn't match. The proper constraint would be a typed metavariable like `(HttpRequest.Builder $_).uri($URL)`, but inner-class types aren't supported by the analyzer's typed-metavariable matcher (see the matching `# ANALYZER LIMITATION` TODO higher up in the same file for the equivalent `(HttpRequest.Builder $B).uri(...)` attempt). Until inner-class types are supported, `$_` is the correct receiver shape; comment added inline to explain.

… for HttpRequest Empirically distinguished: - `new okhttp3.Request.Builder().url($UNTRUSTED)` works as a literal. Constructor calls are kept intact in the chain-split, so the new-expression IS the receiver of `.url(...)`. - `java.net.http.HttpRequest.newBuilder().uri($URL)` literal regresses to FN. Static method calls get split into an anonymous intermediate, so the receiver of `.uri(...)` isn't the literal `newBuilder()` call. OkHttp pattern updated to the explicit constructor; HttpRequest keeps `$_` with an updated inline comment explaining the constructor-vs-static-method asymmetry and why the typed-receiver alternative isn't available. Final result unchanged: OK 1047 | FP 0 | FN 0.

The propagators added for the test-suite work were appended to stdlib.yaml, but most are for third-party libraries that already follow the `jar-split/<artifact>-<version>.yaml` convention used by the existing per-jar configs (spring-context, jackson-databind, guava, netty-*, kotlin, etc.). Move each cluster into its own file and keep stdlib.yaml limited to JDK classes only: - jar-split/jenkins-core-2.426.3.yaml (hudson.FilePath) - jar-split/unboundid-ldapsdk-6.0.11.yaml (SearchRequest) - jar-split/httpcore5-5.2.4.yaml (Apache HC5 StringEntity) - jar-split/okhttp-4.12.0.yaml (Request.Builder chain) - jar-split/spring-ldap-core-2.4.1.yaml (LdapQueryBuilder / ConditionCriteria / ContainerCriteria interface + DefaultConditionCriteria / DefaultContainerCriteria impl) - jar-split/mvel2-2.5.2.Final.yaml (MVEL.compileExpression and JSR-223 MvelScriptEngine) - jar-split/velocity-engine-core-2.3.yaml (VelocityContext.put) - jar-split/groovy-3.0.21.yaml (CompilationUnit.addSource) - jar-split/spring-jdbc-5.3.39.yaml (NamedParameterUtils .parseSqlStatement) - jar-split/spring-web-5.3.39.yaml (RequestEntity factories + BodyBuilder/HeadersBuilder .build()) - jar-split/commons-io-2.15.1.yaml (IOUtils.toString) - jar-split/commons-codec-1.16.0.yaml (Base64 encode/decode) - jar-split/ant-1.10.14.yaml (FileSet.setDir/setFile) stdlib.yaml keeps only the JDK-class entries that were added in the same session: Collection/Iterator/Iterable/Enumeration, String#getBytes, java.util.Base64$Encoder, java.net.URL(String) / java.net.URI#create strengthening, javax.management JMX, javax.xml.transform.stream.StreamSource, java.net.http.HttpRequest + Builder. Per-jar entries that were previously duplicated in stdlib.yaml (the two-step then direct-arg forms) are deduped to the single working form. ConfigLoader walks every `.yaml` under `/config/` recursively at load time, so each file is auto-loaded the same as the existing jar-split entries — no wiring change. Final test result unchanged at OK 1047 | FP 0 | FN 0.

Replace the anonymous $_ receiver on HttpRequest.newBuilder().uri(...) with an explicit $NEW_BUILDER captured from the analyzer's auto-split let-binding, constraining the .uri(...) receiver to an actual newBuilder() result instead of any expression. Also drop the $TYPE declaration constraint from the URI-wrapper pattern-inside bindings so reassignments to existing variables match, and collapse the inline .uri($X).build() forms into the same $BUILDER-binding shape. Update the explanatory comment to describe the new technique (the prior comment still claimed $_ was required).

Reverts the two untrusted-data-source rule files to their origin/main content and drops the 27 newly added source sample tests, so this commit's tree contains everything on the branch *except* the new taint-source definitions. The follow-up commit re-adds them. - rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml - rules/ruleset/java/lib/spring/untrusted-data-source.yaml - rules/test/src/main/java/security/sources/ (27 files)

The rule-test runner (--debug-run-rule-tests -> TestProjectAnalyzer) loads taint passThrough rules only from the analyzer jar's bundled /config resources (loadDefaultConfig -> ConfigLoader); it ignores --approximations-config, which is honored solely by the non-test ProjectAnalyzer. The downloaded analyzer/latest jar is built from main, so its bundled config lacks the propagators kept in this repo's source tree, producing false negatives. Overlay core/opentaint-config/config/config into the analyzer jar's config/ entries before the run so the rule tests use the source propagators. Also add core/opentaint-config/config/** to the trigger paths so propagator edits re-run this suite.

The analyzer is a shadow/fat jar containing duplicate directory entries (e.g. `org/`). `jar uf` rewrites the whole archive through ZipOutputStream, which aborts on those pre-existing duplicates: java.util.zip.ZipException: duplicate entry: org/ Switch the config overlay to `zip`, which copies existing entries verbatim and only replaces/adds the config/ files. Same resulting layout (config/*.yaml, config/jar-split/*.yaml) that loadDefaultConfig -> ConfigLoader reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dvvrd and others added 30 commits May 18, 2026 10:54

More Java/Kotlin rules

0603372

Extend spring-response-injection barriers to match XSS encoder set

b94fff1

Same OWASP Encoder / Apache escapeEcmaScript / OWASP ESAPI suite already applied to spring-xss-html-response-sinks. Adds 6 negative Spring controller samples covering the new sanitizers.

Treat URL encoders as CRLF/http-response-splitting barriers

9357b65

Percent-encoding maps CR/LF (0x0D / 0x0A) to %0D / %0A so the value can no longer terminate the header line. Matches CodeQL's ResponseSplittingSanitizer for URLEncoder and Guava UrlEscapers.

Add Jenkins hudson.Util.escape as XSS / response-injection barrier

e763dda

Mirrors codeql/java/ql/lib/ext/hudson.model.yml barrierModel row for hudson.Util.escape(String). Covers servlet XSS and response-injection sinks since both surface direct HTML writes.

Add pixee Urls.create as SSRF barrier

909c0a7

pixee Urls.create constructs URLs filtered by an allowed protocol set and a HostValidator policy, so the resulting URL is guaranteed to respect those constraints. Mirrors CodeQL pixee SSRF coverage.

Add pixee Urls.create as unvalidated-redirect barrier

8d4ef8d

Same policy-filtered URL construction as the SSRF sink. Mirrors CodeQL's URL-redirect coverage of the pixee toolkit.

Add pixee jakarta.PathValidator.validateDispatcherPath as url-forward…

f61987b

… barrier The helper rejects ".." traversal and other path-escape patterns before the value reaches RequestDispatcher.forward, matching CodeQL's url-forward sanitizer model.

Revert propagators

53e0732

Remove rule tests

dce20dd

misonijnik and others added 10 commits May 18, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rules): Add Java/Kotlin security rules and sanitizers#153

feat(rules): Add Java/Kotlin security rules and sanitizers#153
misonijnik wants to merge 40 commits into
mainfrom
misonijnik/codeql-rules

misonijnik commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

misonijnik commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants