From e864afd7e005375395950e5fe2829b4fed02af3b Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 31 May 2026 13:16:23 +0100
Subject: [PATCH] docs(benchmarks): add honest-framing README for the manual
 benchmark layer (B1)

Wires up Track B1 from the v1.6.5->1.7 readiness taskboard - the honest-claims prerequisite before the 1.6.6 Maven Central debut. There is no README in the benchmarks module today; this PR adds one and explicitly positions the harness as a smoke / diff / endurance tool, not a JMH-grade benchmark.

Sections: what this is and is not, when to use, when NOT to use (publishable claims, architectural decisions, cross-library comparisons), file-by-file role table, exact CI smoke invocation, how to read a report, JMH roadmap (Track C B3 to B6 in 1.7.0).

Tone: senior-defensive. Numbers from this harness are 'rough local comparisons suitable for did this change regress something obviously', not 'X percent faster than Y'. The aim is to make sure no maintainer or downstream caller quotes a number from this layer in a public communication once Central artefacts start shipping.

Verification: ./mvnw test -pl . -Dtest='CanonicalSurfaceGuardTest,DocumentationCoverageTest' - 7 tests, 0 failures (~18s). benchmarks/ has no legacy API tokens so no allowlist entry needed.
---
 CHANGELOG.md         |  11 ++++
 benchmarks/README.md | 134 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+)
 create mode 100644 benchmarks/README.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b62e1fe8..40679e3b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -21,6 +21,17 @@ JitPack continue to resolve through the existing coordinates.
   `./mvnw -DskipTests -P japicmp verify -pl .`; HTML/MD/XML reports
   land in `target/japicmp/`. JitPack repository is scoped to the
   `japicmp` profile, so downstream consumers do not inherit it.
+- **New `benchmarks/README.md`** (Track B1). Honest framing for the
+  manual benchmark layer ahead of the Maven Central debut: explicitly
+  positions the harness as a smoke / diff / endurance tool — not a
+  JMH-grade benchmark — and tells callers when *not* to use it
+  (publishable performance claims, architectural decisions,
+  cross-library comparisons that read too much into a single number).
+  Documents the file-by-file role of each runner / report tool, the
+  exact CI smoke invocation, and a "How to read a report" cheat sheet.
+  Cross-links the planned JMH chain (Track C, B3 → B6 in 1.7.0) so a
+  reader knows what's coming and how to identify "rigorous"
+  measurements when they arrive.
 - **Class-level `@since 1.0.0` Javadoc on the public entry-point
   surface** (Track H1). 26 public types in the canonical user-reached
   packages (`com.demcha.compose.GraphCompose`, `com.demcha.compose.document.api.{DocumentSession, DocumentPageSize, PageBackgroundFill}`,
diff --git a/benchmarks/README.md b/benchmarks/README.md
new file mode 100644
index 00000000..b61e6063
--- /dev/null
+++ b/benchmarks/README.md
@@ -0,0 +1,134 @@
+# GraphCompose Benchmarks Module
+
+> **What this is.** A **manual performance harness** for GraphCompose —
+> a small set of Java programs that render representative documents
+> repeatedly and report rough numbers (latency, throughput, byte size,
+> peak memory) to a JSON / CSV / text report.
+>
+> **What this is _not_.** A JMH-grade benchmark. There is no warmup
+> control, no forked JVM, no per-measurement reset, no GC profiling
+> beyond what JFR / `-verbose:gc` can pick up out-of-band. Numbers
+> produced here are **rough local comparisons** suitable for "did this
+> change regress something obviously?" — not for public marketing
+> claims, not for cross-machine performance comparisons, and not for
+> answering "how does GraphCompose compare to iText / openHTMLToPDF /
+> JasperReports?" with rigour.
+>
+> A separate JMH layer (sibling chain Track C: B3 → B4 → B5 → B6 in the
+> 1.7.0 plan) will sit alongside this harness when it lands. Until
+> then, treat these numbers as **smoke-test fidelity, not benchmark
+> fidelity**.
+
+## When to use the harness
+
+- **Smoke check before a release** — `CurrentSpeedBenchmark -Dgraphcompose.benchmark.profile=smoke`
+  takes ~15 s, exercises the canonical render path through 5 fixture
+  scenarios, and prints a single-page latency / throughput table.
+  CI runs this on every PR (the `perf-smoke` job); the goal is "did
+  this PR make a representative render visibly slower?" — *not* "is
+  this number a publishable performance claim".
+
+- **Pre/post comparison on a single machine** — render a fixture
+  before and after a layout change, run `BenchmarkDiffTool` against
+  the two JSON reports, eyeball the delta. Variance per run is in
+  single-digit percent; treat deltas inside ±5 % as noise on the
+  default machine and tighten the threshold only when comparing on a
+  quiescent system with a fixed CPU frequency.
+
+- **Stress / endurance check** — `GraphComposeStressTest` and
+  `EnduranceTest` drive higher-cardinality fixtures over longer
+  windows to catch GC pressure spikes or memory leaks that a single
+  smoke run wouldn't surface. Run by hand; not on CI by default.
+
+## When **not** to use the harness
+
+- For a **published "X% faster than Y" claim** of any kind — the
+  numbers are not statistically rigorous and the comparison setup is
+  not reproducible across machines / JDKs.
+- For **deciding between two architecturally different approaches** —
+  pick the right invariant (allocation count, big-O of the algorithm,
+  layout-pass count) and reason about it; the harness is a sanity
+  check after you've already chosen, not a decision tool before.
+- For **comparing GraphCompose to another PDF library** —
+  `ComparativeBenchmark` does render the same fixture through iText /
+  openHTMLToPDF / JasperReports for rough sizing, but the comparison
+  is a manual smoke test: each library has different defaults
+  (compression, font embedding, image resampling) and reading too much
+  into a single number is the wrong call.
+
+## Files in this module
+
+| File | Role |
+|---|---|
+| `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. |
+| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. |
+| `FullCvBenchmark`, `ScalabilityBenchmark` | Fixture-specific runners for CV and table-heavy scenarios. |
+| `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
+| `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. |
+| `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. |
+| `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. |
+| `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. |
+| `GraphComposeBenchmark` | Legacy entry point preserved for one downstream caller. New work should target `CurrentSpeedBenchmark`. |
+
+## Running
+
+From the repo root:
+
+```bash
+# Smoke profile (~15s) — what CI runs on every PR
+./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \
+    exec:java \
+    -Dexec.mainClass=com.demcha.compose.CurrentSpeedBenchmark \
+    -Dgraphcompose.benchmark.profile=smoke
+
+# Diff two existing report runs under the same scenario
+./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \
+    exec:java \
+    -Dexec.mainClass=com.demcha.compose.BenchmarkDiffTool \
+    -Dexec.args="current-speed"
+```
+
+Reports land in `benchmarks/target/benchmarks/<scenario>/`. The CI
+`perf-smoke` job uploads the smoke directory as an artifact for every
+PR run, so a regression can be diffed against the previous PR's run
+without reproducing locally.
+
+## How to read a report
+
+The JSON shape is intentionally simple — a top-level run record with
+per-scenario sub-records. Each sub-record carries:
+
+- `avgMs`, `p50Ms`, `p95Ms`, `maxMs` — latency distribution across
+  iterations within the run.
+- `docsPerSec` — rough throughput; **not statistically rigorous**,
+  intended only as a relative number against a sibling scenario or a
+  previous run on the same machine.
+- `avgKB` — average output byte size. Stable across runs on the same
+  fixture; useful for catching content corruption (size shifts by
+  > a few hundred bytes are usually a bug, not a benchmark fluctuation).
+- `peakMB` — peak heap as observed by `MemoryMXBean`; coarse, do not
+  use for memory-budget enforcement.
+
+## Roadmap
+
+The 1.7.0 plan (Track C, B3 → B4 → B5 → B6) introduces a sibling JMH
+layer:
+
+- **B3** — pull fixtures into a `fixtures/` package with deterministic
+  seeds so the JMH layer can reuse them.
+- **B4** — JMH infrastructure (`jmh-core`, `jmh-generator-annprocess`,
+  shade plugin) + first benchmark (`SimpleDocumentJmhBenchmark`).
+- **B5** — Invoice / CV / LargeTable / PdfRender JMH benchmarks.
+- **B6** — CI job that runs the JMH layer on a `workflow_dispatch` /
+  weekly cadence and uploads `*.json` reports as artifacts.
+
+Once that chain is in place, any *public* performance claim should
+quote the JMH layer's numbers, with explicit warmup / measurement /
+fork configuration in the source. This manual harness will stay for
+the smoke / diff / endurance roles described above.
+
+---
+
+*This page is the source of truth for what the manual benchmark layer
+is and is not. When in doubt — and especially before quoting a number
+in a public communication — re-read the "When not to use" section.*