From 3fee703428bbe575509d3b362a5fdc97dff9c903 Mon Sep 17 00:00:00 2001 From: Maxim Stykow Date: Thu, 9 Apr 2026 20:17:53 +0200 Subject: [PATCH] docs(benchmarks): record verified django compare run Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus --- docs/BENCHMARKS.md | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md index 2de2d372c..598bbf19a 100644 --- a/docs/BENCHMARKS.md +++ b/docs/BENCHMARKS.md @@ -26,22 +26,23 @@ It is the maintained package-detection reference for current end-state compariso ## Current benchmark examples -| Target | Files | Machine info | Provenant total | ScanCode total | Relative result | End-state Provenant advantages over ScanCode | -| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -----: | ------------------------------------------------------ | --------------: | -------------: | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [`boostorg/boost @ 4f1cbeb`](https://github.com/boostorg/boost/tree/4f1cbeb724d9f3c08a826fbcee5a3db2f5480441) | 236 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `10.60s` | `58.14s` | `5.47×` faster (`-81.7%`) | More real copyright/author detections and cleaner copyright/author normalization | -| [`boostorg/json @ 70efd4b`](https://github.com/boostorg/json/tree/70efd4b032b7f3e718bb4ca4ae144c3171b21568) | 701 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `29.11s` | `139.57s` | `4.79×` faster (`-79.1%`) | Better structured-metadata handling, cleaner GSoC name normalization, and correct alternative-license interpretation for the embedded Ryu headers | -| [`kubernetes/kubernetes @ d3b9c54`](https://github.com/kubernetes/kubernetes/tree/d3b9c54bd952117924fb0790f6989c0d30715b19) | 29080 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `141.58s` | `2291.67s` | `16.19×` faster (`-93.8%`) | Broader Dockerfile and `go.work` package coverage, richer staging-workspace dependency extraction (`7187` vs `6950`), and richer `BSD-3-Clause AND Apache-2.0` compound license classification where ScanCode collapses many of the same files to plain `Apache-2.0` | -| [`apache/airflow @ 47ce5f3`](https://github.com/apache/airflow/tree/47ce5f32b4fae95f5865ba256d409c778d53a3d5) | 11854 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `76.18s` | `936.34s` | `12.29×` faster (`-91.9%`) | Far broader Python/provider package coverage (`142` vs `1`) and dependency extraction (`7569` vs `450`), plus extra Docker and Helm package visibility, safer URL credential stripping, and cleaner placeholder normalization | -| [`astral-sh/uv @ 9581f2b`](https://github.com/astral-sh/uv/tree/9581f2b0ea65550a3efe28bd7aabde19d98b39ba) | 1225 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `106.58s` | `261.33s` | `2.45×` faster (`-59.2%`) | Far broader Python-family package and dependency extraction (`112` vs `1` packages, `4488` vs `759` dependencies), including `uv.lock` plus nested `requirements/**` inputs, with safer URL credential stripping, Unicode-preserving party normalization, and METADATA-backed wheel identity instead of double-counting a misleading filename | -| [`npm/cli @ 05dbba5`](https://github.com/npm/cli/tree/05dbba5b8d727ddb2c098ce0553714eae791c5f2) | 6698 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 4 proc` | `295.10s` | `3376.85s` | `11.44×` faster (`-91.3%`) | Clean root npm workspace manifest coverage without ScanCode's workspace-assembly scan errors, fewer large registry-fixture JSON timeouts, and cleaner handling of duplicated private-workspace dependency exports and repeated MIT-style registry-fixture metadata noise | -| [`oven-sh/bun @ 700fc11`](https://github.com/oven-sh/bun/tree/700fc117a2fd01ac0201deaa6fa69c5557acb04f) | 12551 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `43.05s` | `849.10s` | `19.72×` faster (`-94.9%`) | Far broader Bun/npm-family package extraction (`382` vs `29` packages, `5773` vs `323` dependencies), legacy `bun.lockb` coverage on `bench/bundle`, and plainer `BSD-2-Clause` rebucketing where ScanCode uses the over-specific `BSD-2-Clause-Views` label | -| [`nmap/nmap @ d9199d7`](https://github.com/nmap/nmap/tree/d9199d7cd5e99f54fc4b67d592a30fa597a94c40) | 2587 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `52.87s` | `447.07s` | `8.46×` faster (`-88.2%`) | Broader package/dependency extraction (`18` vs `2` packages, `13` vs `2` dependencies), preserved NPSL/source-available handling across core Nmap and Zenmap reference-notice files, and cleaner rejection of weak translated-manpage GPL bare-word and placeholder noise | -| [`ffmpeg/ffmpeg @ 056562a`](https://github.com/ffmpeg/ffmpeg/tree/056562a5ff64e79ad40b141ded3f644811e812f6) | 10200 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `60.60s` | `812.80s` | `13.41×` faster (`-92.5%`) | Matched ScanCode's file-level Autotools `configure` package identity while also promoting one top-level Autotools package (`1` vs `0`), plus cleaner rejection of weak `configure` variable-name and bare-word GPL noise such as `EXTERNAL_LIBRARY_GPL_LIST` and `LICENSE_LIST="gpl"` | -| [`chromium/chromium @ 2befda7`](https://github.com/chromium/chromium/tree/2befda78fcc7fa5649540420eedcdd87a2583fe0) | 490886 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `1896.63s` | `21194.49s` | `11.17×` faster (`-91.1%`) | Near-aligned package coverage with Bazel (`1309` vs `1279`), materially richer dependency extraction (`16509` vs `12378`), matched scan-error counts (`4` vs `4`), and richer compound license expressions where ScanCode often collapses the same files to plainer permissive labels | -| [`mongodb/mongo @ d6877a3`](https://github.com/mongodb/mongo/tree/d6877a33a90e253f4e7a9641a3eb237518a5a495) | 52443 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `205.67s` | `4363.53s` | `21.22×` faster (`-95.3%`) | Broader package/dependency extraction (`40` vs `1` packages, `618` vs `7` dependencies), richer Debian namespace/PURL identity on package metadata, and cleaner lockfile/SBOM package and license shaping | -| [`debian:bookworm-slim @ sha256:f065376`](https://hub.docker.com/layers/library/debian/bookworm-slim/images/sha256-f06537653ac770703bc45b4b113475bd402f451e85223f0f2837acbf89ab020a) | 3267 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `21.05s` | `156.25s` | `7.42×` faster (`-86.5%`) | Better Debian dependency relationships from `dpkg/status`, source-faithful local-license resolution, and cleaner author/email/url results under the shared `common` profile | -| [`Fedora 26 rootfs fixture @ sha256:140ce3f`](../testdata/rpm/bdb-fedora-rootfs.tar.xz) | 1579 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `21.85s` | `129.25s` | `5.92×` faster (`-83.1%`) | Installed-RPM package and dependency extraction from the Fedora BDB where ScanCode emits no package/dependency objects under the shared profile, plus cleaner rejection of weak bare-word and filename-based RPM DB binary-text noise | -| [`Alpine 3.23.3 minirootfs @ sha256:42d0e6d`](https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.23.3-x86_64.tar.gz) | 84 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `19.47s` | `23.84s` | `1.22×` faster (`-18.3%`) | Equal top-level Alpine package count with Alpine-native installed-db dependency requirements and virtual providers preserved, plus cleaner BusyBox/OpenSSL binary-text normalization and richer `os-release` package identity | +| Target | Files | Machine info | Provenant total | ScanCode total | Relative result | End-state Provenant advantages over ScanCode | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -----: | ------------------------------------------------------ | --------------: | -------------: | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [`boostorg/boost @ 4f1cbeb`](https://github.com/boostorg/boost/tree/4f1cbeb724d9f3c08a826fbcee5a3db2f5480441) | 236 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `10.60s` | `58.14s` | `5.47×` faster (`-81.7%`) | More real copyright/author detections and cleaner copyright/author normalization | +| [`boostorg/json @ 70efd4b`](https://github.com/boostorg/json/tree/70efd4b032b7f3e718bb4ca4ae144c3171b21568) | 701 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `29.11s` | `139.57s` | `4.79×` faster (`-79.1%`) | Better structured-metadata handling, cleaner GSoC name normalization, and correct alternative-license interpretation for the embedded Ryu headers | +| [`kubernetes/kubernetes @ d3b9c54`](https://github.com/kubernetes/kubernetes/tree/d3b9c54bd952117924fb0790f6989c0d30715b19) | 29080 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `141.58s` | `2291.67s` | `16.19×` faster (`-93.8%`) | Broader Dockerfile and `go.work` package coverage, richer staging-workspace dependency extraction (`7187` vs `6950`), and richer `BSD-3-Clause AND Apache-2.0` compound license classification where ScanCode collapses many of the same files to plain `Apache-2.0` | +| [`apache/airflow @ 47ce5f3`](https://github.com/apache/airflow/tree/47ce5f32b4fae95f5865ba256d409c778d53a3d5) | 11854 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `76.18s` | `936.34s` | `12.29×` faster (`-91.9%`) | Far broader Python/provider package coverage (`142` vs `1`) and dependency extraction (`7569` vs `450`), plus extra Docker and Helm package visibility, safer URL credential stripping, and cleaner placeholder normalization | +| [`astral-sh/uv @ 9581f2b`](https://github.com/astral-sh/uv/tree/9581f2b0ea65550a3efe28bd7aabde19d98b39ba) | 1225 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `106.58s` | `261.33s` | `2.45×` faster (`-59.2%`) | Far broader Python-family package and dependency extraction (`112` vs `1` packages, `4488` vs `759` dependencies), including `uv.lock` plus nested `requirements/**` inputs, with safer URL credential stripping, Unicode-preserving party normalization, and METADATA-backed wheel identity instead of double-counting a misleading filename | +| [`django/django @ 09f27cc`](https://github.com/django/django/tree/09f27cc373eb1e6e5e8b286204809a79b61d55c3) | 6994 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `29.74s` | `357.65s` | `12.03×` faster (`-91.7%`) | Far broader Python-family package and dependency extraction (`2` vs `1` packages, `16` vs `6` dependencies), including the root PyPI package from `pyproject.toml` plus documentation requirements from `docs/requirements.txt`, with clearer `BSD-3-Clause` declared-license capture and visibility into the vendored CVS marker that ScanCode skips | +| [`npm/cli @ 05dbba5`](https://github.com/npm/cli/tree/05dbba5b8d727ddb2c098ce0553714eae791c5f2) | 6698 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 4 proc` | `295.10s` | `3376.85s` | `11.44×` faster (`-91.3%`) | Clean root npm workspace manifest coverage without ScanCode's workspace-assembly scan errors, fewer large registry-fixture JSON timeouts, and cleaner handling of duplicated private-workspace dependency exports and repeated MIT-style registry-fixture metadata noise | +| [`oven-sh/bun @ 700fc11`](https://github.com/oven-sh/bun/tree/700fc117a2fd01ac0201deaa6fa69c5557acb04f) | 12551 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `43.05s` | `849.10s` | `19.72×` faster (`-94.9%`) | Far broader Bun/npm-family package extraction (`382` vs `29` packages, `5773` vs `323` dependencies), legacy `bun.lockb` coverage on `bench/bundle`, and plainer `BSD-2-Clause` rebucketing where ScanCode uses the over-specific `BSD-2-Clause-Views` label | +| [`nmap/nmap @ d9199d7`](https://github.com/nmap/nmap/tree/d9199d7cd5e99f54fc4b67d592a30fa597a94c40) | 2587 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `52.87s` | `447.07s` | `8.46×` faster (`-88.2%`) | Broader package/dependency extraction (`18` vs `2` packages, `13` vs `2` dependencies), preserved NPSL/source-available handling across core Nmap and Zenmap reference-notice files, and cleaner rejection of weak translated-manpage GPL bare-word and placeholder noise | +| [`ffmpeg/ffmpeg @ 056562a`](https://github.com/ffmpeg/ffmpeg/tree/056562a5ff64e79ad40b141ded3f644811e812f6) | 10200 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `60.60s` | `812.80s` | `13.41×` faster (`-92.5%`) | Matched ScanCode's file-level Autotools `configure` package identity while also promoting one top-level Autotools package (`1` vs `0`), plus cleaner rejection of weak `configure` variable-name and bare-word GPL noise such as `EXTERNAL_LIBRARY_GPL_LIST` and `LICENSE_LIST="gpl"` | +| [`chromium/chromium @ 2befda7`](https://github.com/chromium/chromium/tree/2befda78fcc7fa5649540420eedcdd87a2583fe0) | 490886 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `1896.63s` | `21194.49s` | `11.17×` faster (`-91.1%`) | Near-aligned package coverage with Bazel (`1309` vs `1279`), materially richer dependency extraction (`16509` vs `12378`), matched scan-error counts (`4` vs `4`), and richer compound license expressions where ScanCode often collapses the same files to plainer permissive labels | +| [`mongodb/mongo @ d6877a3`](https://github.com/mongodb/mongo/tree/d6877a33a90e253f4e7a9641a3eb237518a5a495) | 52443 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `205.67s` | `4363.53s` | `21.22×` faster (`-95.3%`) | Broader package/dependency extraction (`40` vs `1` packages, `618` vs `7` dependencies), richer Debian namespace/PURL identity on package metadata, and cleaner lockfile/SBOM package and license shaping | +| [`debian:bookworm-slim @ sha256:f065376`](https://hub.docker.com/layers/library/debian/bookworm-slim/images/sha256-f06537653ac770703bc45b4b113475bd402f451e85223f0f2837acbf89ab020a) | 3267 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `21.05s` | `156.25s` | `7.42×` faster (`-86.5%`) | Better Debian dependency relationships from `dpkg/status`, source-faithful local-license resolution, and cleaner author/email/url results under the shared `common` profile | +| [`Fedora 26 rootfs fixture @ sha256:140ce3f`](../testdata/rpm/bdb-fedora-rootfs.tar.xz) | 1579 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `21.85s` | `129.25s` | `5.92×` faster (`-83.1%`) | Installed-RPM package and dependency extraction from the Fedora BDB where ScanCode emits no package/dependency objects under the shared profile, plus cleaner rejection of weak bare-word and filename-based RPM DB binary-text noise | +| [`Alpine 3.23.3 minirootfs @ sha256:42d0e6d`](https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.23.3-x86_64.tar.gz) | 84 | `macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc` | `19.47s` | `23.84s` | `1.22×` faster (`-18.3%`) | Equal top-level Alpine package count with Alpine-native installed-db dependency requirements and virtual providers preserved, plus cleaner BusyBox/OpenSSL binary-text normalization and richer `os-release` package identity | ## How to extend this document