DemchaAV · DemchaAV · May 31, 2026 · May 31, 2026
@@ -21,6 +21,17 @@ JitPack continue to resolve through the existing coordinates.
   `./mvnw -DskipTests -P japicmp verify -pl .`; HTML/MD/XML reports
   land in `target/japicmp/`. JitPack repository is scoped to the
   `japicmp` profile, so downstream consumers do not inherit it.
+- **New `benchmarks/README.md`** (Track B1). Honest framing for the
+  manual benchmark layer ahead of the Maven Central debut: explicitly
+  positions the harness as a smoke / diff / endurance tool — not a
+  JMH-grade benchmark — and tells callers when *not* to use it
+  (publishable performance claims, architectural decisions,
+  cross-library comparisons that read too much into a single number).
+  Documents the file-by-file role of each runner / report tool, the
+  exact CI smoke invocation, and a "How to read a report" cheat sheet.
+  Cross-links the planned JMH chain (Track C, B3 → B6 in 1.7.0) so a
+  reader knows what's coming and how to identify "rigorous"
+  measurements when they arrive.
 - **Class-level `@since 1.0.0` Javadoc on the public entry-point
   surface** (Track H1). 26 public types in the canonical user-reached
   packages (`com.demcha.compose.GraphCompose`, `com.demcha.compose.document.api.{DocumentSession, DocumentPageSize, PageBackgroundFill}`,

@@ -0,0 +1,134 @@
+# GraphCompose Benchmarks Module
+
+> **What this is.** A **manual performance harness** for GraphCompose —
+> a small set of Java programs that render representative documents
+> repeatedly and report rough numbers (latency, throughput, byte size,
+> peak memory) to a JSON / CSV / text report.
+>
+> **What this is _not_.** A JMH-grade benchmark. There is no warmup
+> control, no forked JVM, no per-measurement reset, no GC profiling
+> beyond what JFR / `-verbose:gc` can pick up out-of-band. Numbers
+> produced here are **rough local comparisons** suitable for "did this
+> change regress something obviously?" — not for public marketing
+> claims, not for cross-machine performance comparisons, and not for
+> answering "how does GraphCompose compare to iText / openHTMLToPDF /
+> JasperReports?" with rigour.
+>
+> A separate JMH layer (sibling chain Track C: B3 → B4 → B5 → B6 in the
+> 1.7.0 plan) will sit alongside this harness when it lands. Until
+> then, treat these numbers as **smoke-test fidelity, not benchmark
+> fidelity**.
+
+## When to use the harness
+
+- **Smoke check before a release** — `CurrentSpeedBenchmark -Dgraphcompose.benchmark.profile=smoke`
+  takes ~15 s, exercises the canonical render path through 5 fixture
+  scenarios, and prints a single-page latency / throughput table.
+  CI runs this on every PR (the `perf-smoke` job); the goal is "did
+  this PR make a representative render visibly slower?" — *not* "is
+  this number a publishable performance claim".
+
+- **Pre/post comparison on a single machine** — render a fixture
+  before and after a layout change, run `BenchmarkDiffTool` against
+  the two JSON reports, eyeball the delta. Variance per run is in
+  single-digit percent; treat deltas inside ±5 % as noise on the
+  default machine and tighten the threshold only when comparing on a
+  quiescent system with a fixed CPU frequency.
+
+- **Stress / endurance check** — `GraphComposeStressTest` and
+  `EnduranceTest` drive higher-cardinality fixtures over longer
+  windows to catch GC pressure spikes or memory leaks that a single
+  smoke run wouldn't surface. Run by hand; not on CI by default.
+
+## When **not** to use the harness
+
+- For a **published "X% faster than Y" claim** of any kind — the
+  numbers are not statistically rigorous and the comparison setup is
+  not reproducible across machines / JDKs.
+- For **deciding between two architecturally different approaches** —
+  pick the right invariant (allocation count, big-O of the algorithm,
+  layout-pass count) and reason about it; the harness is a sanity
+  check after you've already chosen, not a decision tool before.
+- For **comparing GraphCompose to another PDF library** —
+  `ComparativeBenchmark` does render the same fixture through iText /
+  openHTMLToPDF / JasperReports for rough sizing, but the comparison
+  is a manual smoke test: each library has different defaults
+  (compression, font embedding, image resampling) and reading too much
+  into a single number is the wrong call.
+
+## Files in this module
+
+| File | Role |
+|---|---|
+| `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. |
+| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. |
+| `FullCvBenchmark`, `ScalabilityBenchmark` | Fixture-specific runners for CV and table-heavy scenarios. |
+| `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
+| `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. |
+| `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. |
+| `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. |
+| `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. |
+| `GraphComposeBenchmark` | Legacy entry point preserved for one downstream caller. New work should target `CurrentSpeedBenchmark`. |
+
+## Running
+
+From the repo root:
+
+```bash
+# Smoke profile (~15s) — what CI runs on every PR
+./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \
+    exec:java \
+    -Dexec.mainClass=com.demcha.compose.CurrentSpeedBenchmark \
+    -Dgraphcompose.benchmark.profile=smoke
+
+# Diff two existing report runs under the same scenario
+./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \
+    exec:java \
+    -Dexec.mainClass=com.demcha.compose.BenchmarkDiffTool \
+    -Dexec.args="current-speed"
+```
+
+Reports land in `benchmarks/target/benchmarks/<scenario>/`. The CI
+`perf-smoke` job uploads the smoke directory as an artifact for every
+PR run, so a regression can be diffed against the previous PR's run
+without reproducing locally.
+
+## How to read a report
+
+The JSON shape is intentionally simple — a top-level run record with
+per-scenario sub-records. Each sub-record carries:
+
+- `avgMs`, `p50Ms`, `p95Ms`, `maxMs` — latency distribution across
+  iterations within the run.
+- `docsPerSec` — rough throughput; **not statistically rigorous**,
+  intended only as a relative number against a sibling scenario or a
+  previous run on the same machine.
+- `avgKB` — average output byte size. Stable across runs on the same
+  fixture; useful for catching content corruption (size shifts by
+  > a few hundred bytes are usually a bug, not a benchmark fluctuation).
+- `peakMB` — peak heap as observed by `MemoryMXBean`; coarse, do not
+  use for memory-budget enforcement.
+
+## Roadmap
+
+The 1.7.0 plan (Track C, B3 → B4 → B5 → B6) introduces a sibling JMH
+layer:
+
+- **B3** — pull fixtures into a `fixtures/` package with deterministic
+  seeds so the JMH layer can reuse them.
+- **B4** — JMH infrastructure (`jmh-core`, `jmh-generator-annprocess`,
+  shade plugin) + first benchmark (`SimpleDocumentJmhBenchmark`).
+- **B5** — Invoice / CV / LargeTable / PdfRender JMH benchmarks.
+- **B6** — CI job that runs the JMH layer on a `workflow_dispatch` /
+  weekly cadence and uploads `*.json` reports as artifacts.
+
+Once that chain is in place, any *public* performance claim should
+quote the JMH layer's numbers, with explicit warmup / measurement /
+fork configuration in the source. This manual harness will stay for
+the smoke / diff / endurance roles described above.
+
+---
+
+*This page is the source of truth for what the manual benchmark layer
+is and is not. When in doubt — and especially before quoting a number
+in a public communication — re-read the "When not to use" section.*