Skip to content

Adding dependency graph submission workflow#3502

Merged
iamsanjay merged 3 commits intoapache:mainfrom
iamsanjay:enable_dep_graph
Sep 2, 2025
Merged

Adding dependency graph submission workflow#3502
iamsanjay merged 3 commits intoapache:mainfrom
iamsanjay:enable_dep_graph

Conversation

@iamsanjay
Copy link
Contributor

Inspired by: Lucene issue #15114

This adds GitHub’s dependency graph setup so it’s easier for us to keep an eye on dependencies.

  • Submits a full dependency graph (including transitives) that shows up under Insights → Dependency graph, so we can quickly check what we’re pulling in.

  • [Future PR] Runs dependency review on each PR, comparing against main and flagging any unexpected changes.

  • Hooks in Dependabot alerts so we get notified about High/Critical CVEs right away.

Overall, this makes troubleshooting, checking vulnerability status, and acting on alerts a lot smoother within the GitHub ecosystem. There’s plenty more potential here as we explore the feature further and as GitHub keeps expanding its toolkit.

Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I occasionally have angst about using all the Github features, cause someday they will turn obviously evil and then we'll be locked in so tight with them...

Also, I love how clear your issue descriptions are!

@iamsanjay
Copy link
Contributor Author

For now I’m not removing anything like the OWASP task. My main intention is to make this more visible, and I think GitHub is the perfect place for that. The UI makes it easier (and honestly cooler) for people to actually pay attention to CVE info, which usually feels pretty boring.

uses: gradle/actions/dependency-submission@v4
with:
build-scan-publish: true
develocity-access-key: ${{ secrets.SOLR_DEVELOCITY_ACCESS_KEY }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want Develocity integration for this workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really actually. I'll remove it.

Comment on lines +32 to +35
DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(^|:)(compileClasspath|runtimeClasspath|testCompileClasspath|testRuntimeClasspath)$"
DEPENDENCY_GRAPH_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)(classpath|.*PluginClasspath|kotlinCompilerClasspath|kaptClasspath|annotationProcessor|detachedConfiguration.*)$"
DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS: "(?i)(^|:)runtimeClasspath$"
DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)testRuntimeClasspath$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come to choose this config?

@iamsanjay
Copy link
Contributor Author

How did you choose this config?

Short answer: ChatGPT.

Long answer:
The Dependency Graph was pulling in Gradle/plugin tooling (buildscript classpath, plugin classpaths, annotation processors, etc.), which added noise. I’m pruning the submission to keep only real app/test classpaths and exclude tooling. I also tag scopes so we can focus security on what actually ships.

  1. What we submit (include/exclude)
  • DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS → only: compileClasspath, runtimeClasspath, testCompileClasspath, testRuntimeClasspath.

  • DEPENDENCY_GRAPH_EXCLUDE_CONFIGURATIONS → drop tooling: classpath, .PluginClasspath, kotlinCompilerClasspath, kaptClasspath, annotationProcessor, detachedConfiguration..

  1. How we scope it (runtime vs development)
  • DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS → treat runtimeClasspath as runtime (what ships).

  • DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_CONFIGURATIONS → keep testRuntimeClasspath out of runtime; everything else is development. (Strictly optional, but makes the scope labels cleaner.)

Cleaner Dependency Graph/SBOM and more relevant Dependabot alerts—JUnit/Mockito/tooling CVEs won’t show up as runtime risks. We can filter by scope and fix runtime issues first.

Please review
Gradle can be surprising; there’s a real chance I excluded something we actually want tracked. If you see a config we should include , let me know and I’ll add it. OWASP Dependency-Check is still enabled—I’ll cross-check its findings against GitHub’s graph and follow up on any gaps.

https://github.com/gradle/github-dependency-graph-gradle-plugin

Copy link
Contributor

@malliaridis malliaridis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time in the past with the dependency graph and I looked into the build.gradle file of the solr:packaging module and the package.gradle to figure out what is shipped and what not. So I was considering of adding our custom configurations instead of the typical runtimeClasspath, so I used only packaging configuration and was thinking of adding packaging, runtimeLibs and / or solrPlatformLibs.

But I wasn't sure if that was correct thinking. Are those covered by the configurations you already include?

@iamsanjay
Copy link
Contributor Author

iamsanjay commented Aug 27, 2025

Regarding packaging, runtimeLibs, and/or solrPlatformLibs: these custom configurations mainly exist to “do the math” of wiring things together.

For example, solrPlatformLibs is defined to pull in several Solr modules:

dependencies {
  solrPlatformLibs project(":solr:core")
  solrPlatformLibs project(":solr:solrj")
  solrPlatformLibs project(":solr:api")
  solrPlatformLibs project(":solr:solrj-zookeeper")
  // libExt has logging libs, which we don’t want. Let users decide what they want.
  solrPlatformLibs project(path: ":solr:server", configuration: "libExt")

  // The cross-dc-manager uses the cross-dc Solr module libraries
  // as well as the Jetty server jars
  if (project.path == ":solr:cross-dc-manager") {
    solrPlatformLibs project(":solr:modules:cross-dc")
    solrPlatformLibs project(path: ":solr:server", configuration: "serverLib")
  }
}

Similarly, runtimeLibs just extends from runtimeElements:

runtimeLibs {
  extendsFrom runtimeElements
}


So if I look at the current config for the dependency graph, it’s basically a superset — everything required to develop or run the project across all these places:

DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(^|:)(compileClasspath|runtimeClasspath|testCompileClasspath|testRuntimeClasspath)$"

@ppkarwasz
Copy link
Contributor

Long answer: The Dependency Graph was pulling in Gradle/plugin tooling (buildscript classpath,
plugin classpaths, annotation processors, etc.), which added noise. I’m pruning the submission to
keep only real app/test classpaths and exclude tooling. I also tag scopes so we can focus security on
what actually ships.

Looks great, thank you for sorting that out! 💯

One of the reasons I had dismissed the Gradle Dependency Submission action in my investigations was precisely the large number of non-runtime dependencies it was reporting. As you’ve shown, I just hadn’t looked deep enough into the plugin’s configuration options. My approach was heading in a different direction:

  • Generate an SBOM for the tar.gz distributions.
  • Submit that SBOM to GitHub’s Dependency Graph.

The motivation for focusing on the tar.gz distributions is that we need to distinguish between shipped and non-shipped dependencies:

  • If solr-core depends on something but that JAR is not shipped in the distribution, then any vulnerability has the same implications as for any other Java library: end users can update it themselves, so it’s lower priority.
  • If the dependency is shipped, however, then every vulnerability potentially requires a new Solr release or at least a not-affected VEX statement.

That’s why, at least initially, I would prefer to limit submissions to dependencies that are actually shipped, ignoring the broader runtimeClasspath. (From my analysis, there are roughly half a dozen low–moderate vulnerabilities in shipped dependencies right now.)

I didn’t finish mapping all configurations that represent shipped artifacts, but I did confirm that packaging and similar configs don’t work well (they seem to be computed too late for the plugin to pick them up). What does work for me are these custom configurations in :solr:server:

// === Custom configurations used to assemble the Solr server binary distribution ===

// 1. Jetty Bootstrap JAR
// Output Path: server/start.jar
startJar

// 2. Server Libraries
// Output Path: server/lib/
// Description: Core runtime libraries (mostly Jetty-related JARs).
serverLib

// 3. Extended Server Libraries
// Output Path: server/lib/ext/
// Description: Optional runtime libs like logging (SLF4J/Log4j) and metrics (Dropwizard, etc.).
libExt

// 4. Solr Core JAR
// Output Path: server/solr-webapp/webapp/WEB-INF/lib/
solrCore

// 5. Solr Web Application Libraries
// Output Path: server/solr-webapp/webapp/WEB-INF/lib/
webapp

So far I’ve only analyzed the server folder of the distribution, but those configurations appear to line up exactly with the shipped JARs. That means we could narrow the submission with:

DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS="startJar|serverLib|libExt|solrCore|webapp"

I'll look at the remaining components of the distribution archive this week.

PS: I’ve also looked into Dependabot’s support for updating Gradle lock files. While basic support has been available since June, there are still unresolved issues when a project uses both lock files and version catalogs (see dependabot/dependabot-core#12557).

@iamsanjay
Copy link
Contributor Author

With the configuration you shared, the total dependency count is down to 138 (from 518 under the Maven ecosystem). Dependabot alerts have also dropped to 4 from 21. Does this reduction also cover Solr modules?

That said, I’m only seeing 1 high and 1 moderate vulnerability, whereas you mentioned there should be at least half a dozen—so we might still be missing some then. I should also try to generate SBOM on tar.gz to see the difference against the one which we can export from the Github.

@ppkarwasz
Copy link
Contributor

With the configuration you shared, the total dependency count is down to 138 (from 518 under the Maven ecosystem). Dependabot alerts have also dropped to 4 from 21. Does this reduction also cover Solr modules?

No, the configuration I provided only covers the server folder of the distribution. By a quick count:

  • 138 is exactly the number of JARs in server, so that part matches up.
  • modules should contribute around 305 JARs.
  • cross-dc-manager adds about 40 JARs.
  • prometheus-exporter adds about 5 JARs.

So in total, we should expect roughly 488 JARs across all shipped components.

That said, I’m only seeing 1 high and 1 moderate vulnerability, whereas you mentioned there should be at least half a dozen—so we might still be missing some then. I should also try to generate SBOM on tar.gz to see the difference against the one which we can export from the Github.

I may have overstated it a little, but there are at least these open issues:

I also have a script ready that can automatically submit these to JIRA. I’ve held off running it so far, because I want to first clean up the dependency graph in my fork, otherwise I will be creating JIRA issues for irrelevant build-time dependencies.

@iamsanjay
Copy link
Contributor Author

The one I shared earlier comes out to about 518, so not too far off from the ~488 you counted. It’s just using the usual Gradle configs (compile, runtime, etc.), so fine as a starting point. Our main concern is really the high/moderate vulnerabilities, and that count isn’t too high. This setup doesn’t add much noise—maybe a bit in the dependency graph—but from Dependabot’s side we can always scope things based on runtime vs. development.

DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(^|:)(compileClasspath|runtimeClasspath|testCompileClasspath|testRuntimeClasspath)$"
DEPENDENCY_GRAPH_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)(classpath|.*PluginClasspath|kotlinCompilerClasspath|kaptClasspath|annotationProcessor|detachedConfiguration.*)$"
DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS: "(?i)(^|:)runtimeClasspath$"
DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)testRuntimeClasspath$"

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iamsanjay,

I ran some additional checks to compare the dependencies reported as Runtime in the GitHub Dependency Graph against the actual Solr binary distribution, and the results look very promising:

  • The Apache Solr 9.9.0 binary distribution ships with 421 third-party JARs, plus 24 Solr JARs, 23 duplicates, and one example JAR: 489 JARs in total. The full list is available in this gist.

  • With the configuration from this PR (plus startJar), all of those dependencies are reported as Runtime dependencies to GitHub, which aligns nicely with what is actually shipped.

  • A few dependencies are reported as Runtime but are not included in the binary distribution:

    com.carrotsearch.randomizedtesting:randomizedtesting-runner:2.8.1
    com.squareup.okio:okio:3.6.0
    junit:junit:4.13.2
    org.apache.lucene:lucene-test-framework:9.12.2
    org.hamcrest:hamcrest:3.0
    org.hamcrest:hamcrest-core:3.0
    org.jctools:jctools-core:4.0.5
    org.openjdk.jmh:jmh-core:1.37
    org.quicktheories:quicktheories:0.26
    

    These are not as important as the shipped JARs, but won't generate too much CVE noise in practice.

  • The distribution also contains 9 shaded artifacts. Since GitHub won’t warn us about CVEs in shaded dependencies, we’ll probably want to keep these manually upgraded to the latest available versions:

    com.fasterxml.woodstox:woodstox-core:7.0.0
    com.ibm.icu:icu4j:74.2
    io.netty:netty-common:4.1.114.Final
    io.opentelemetry:opentelemetry-sdk-trace:1.40.0
    org.apache.curator:curator-client:5.7.0
    org.apache.hadoop:hadoop-client-api:3.4.0
    org.apache.hadoop:hadoop-client-runtime:3.4.0
    org.apache.hadoop.thirdparty:hadoop-shaded-guava:1.2.0
    org.tallison.xmp:xmpcore-shaded:6.1.10
    
  • One interesting observation: log4j-core is included in the reported dependencies. Normally, artifacts published to Maven Central should not expose logging implementations (see ZOOKEEPER-4820), but in this case it’s being pulled in by solr-prometheus-exporter. I’ll open a follow-up issue + PR to address that. Once it’s removed from runtimeClasspath, we’ll also need to update this workflow.

Overall, this configuration looks like a solid match for what we distribute. Nice work! 💯

Comment on lines +24 to +27
DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(^|:)(compileClasspath|runtimeClasspath|testCompileClasspath|testRuntimeClasspath)$"
DEPENDENCY_GRAPH_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)(classpath|.*PluginClasspath|kotlinCompilerClasspath|kaptClasspath|annotationProcessor|detachedConfiguration.*)$"
DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS: "(?i)(^|:)runtimeClasspath$"
DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)testRuntimeClasspath$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this workflow could be simplified and clarified a bit:

  • Since we’re already explicitly listing the configurations we care about, we don’t really need an additional exclude list.
  • From what I understand, the regex is evaluated against the configuration name itself (not prefixed with the module name), so the (^|:) and $ anchors aren’t necessary. Empirical tests confirm this.
  • The startJar configuration isn’t currently included, so we should add it to the list.
  • We should also exclude the :solr:test-framework project from the runtime dependencies to avoid pulling it in unnecessarily.
Suggested change
DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(^|:)(compileClasspath|runtimeClasspath|testCompileClasspath|testRuntimeClasspath)$"
DEPENDENCY_GRAPH_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)(classpath|.*PluginClasspath|kotlinCompilerClasspath|kaptClasspath|annotationProcessor|detachedConfiguration.*)$"
DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS: "(?i)(^|:)runtimeClasspath$"
DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_CONFIGURATIONS: "(?i)(^|:)testRuntimeClasspath$"
DEPENDENCY_GRAPH_INCLUDE_CONFIGURATIONS: "(?i)(test)?(compile|runtime)classpath|startJar"
DEPENDENCY_GRAPH_RUNTIME_EXCLUDE_PROJECTS: ":solr:test-framework"
DEPENDENCY_GRAPH_RUNTIME_INCLUDE_CONFIGURATIONS: "runtimeClasspath|startJar"

@iamsanjay
Copy link
Contributor Author

Hey @ppkarwasz, thanks so much for looking into this. This is exactly what we needed to cross-check, and it looks like you’ve captured all the finer details. But I have one question currently I am only running for main, Should I include branch_9x. wdyt?

@ppkarwasz
Copy link
Contributor

Should I include branch_9x. wdyt?

It really depends on what you want to achieve.

  • Dependency Graph: GitHub only uses the graph from the default branch to determine dependent projects. When I tried submitting for another branch, I got this message:

    Submitted dependency-graph-reports/dependency_graph_submission-dependency-submission.json: The snapshot was accepted, but it is not for the default branch. It will not update dependency results for the repository.

    So including branch_9x won’t affect the dependency graph GitHub shows.

  • SBOM generation: You can still generate an SBOM from a non-default branch, but the SBOMs GitHub produces are of limited value: they often include extra dependencies and can only be created for branches (not for release tags). Personally, I prefer to control SBOM contents more tightly. I’m also working on SBOM Enforcer to enrich SBOMs with data that other tools miss: for example, shaded dependencies in Hadoop, which can be retrieved from Hadoop's SBOM (but not POM file). That’s a slow, ongoing effort, but I think I know how to tackle it.

  • Dependabot Alerts: In theory, you can get alerts on non-default branches (by configuring Dependabot to open 0 PRs there). In practice I haven’t managed to get this working, but I'll try.

@iamsanjay iamsanjay merged commit 80fa70f into apache:main Sep 2, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants