diff --git a/CHANGELOG.md b/CHANGELOG.md index b62e1fe8..40679e3b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,17 @@ JitPack continue to resolve through the existing coordinates. `./mvnw -DskipTests -P japicmp verify -pl .`; HTML/MD/XML reports land in `target/japicmp/`. JitPack repository is scoped to the `japicmp` profile, so downstream consumers do not inherit it. +- **New `benchmarks/README.md`** (Track B1). Honest framing for the + manual benchmark layer ahead of the Maven Central debut: explicitly + positions the harness as a smoke / diff / endurance tool — not a + JMH-grade benchmark — and tells callers when *not* to use it + (publishable performance claims, architectural decisions, + cross-library comparisons that read too much into a single number). + Documents the file-by-file role of each runner / report tool, the + exact CI smoke invocation, and a "How to read a report" cheat sheet. + Cross-links the planned JMH chain (Track C, B3 → B6 in 1.7.0) so a + reader knows what's coming and how to identify "rigorous" + measurements when they arrive. - **Class-level `@since 1.0.0` Javadoc on the public entry-point surface** (Track H1). 26 public types in the canonical user-reached packages (`com.demcha.compose.GraphCompose`, `com.demcha.compose.document.api.{DocumentSession, DocumentPageSize, PageBackgroundFill}`, diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 00000000..b61e6063 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,134 @@ +# GraphCompose Benchmarks Module + +> **What this is.** A **manual performance harness** for GraphCompose — +> a small set of Java programs that render representative documents +> repeatedly and report rough numbers (latency, throughput, byte size, +> peak memory) to a JSON / CSV / text report. +> +> **What this is _not_.** A JMH-grade benchmark. There is no warmup +> control, no forked JVM, no per-measurement reset, no GC profiling +> beyond what JFR / `-verbose:gc` can pick up out-of-band. Numbers +> produced here are **rough local comparisons** suitable for "did this +> change regress something obviously?" — not for public marketing +> claims, not for cross-machine performance comparisons, and not for +> answering "how does GraphCompose compare to iText / openHTMLToPDF / +> JasperReports?" with rigour. +> +> A separate JMH layer (sibling chain Track C: B3 → B4 → B5 → B6 in the +> 1.7.0 plan) will sit alongside this harness when it lands. Until +> then, treat these numbers as **smoke-test fidelity, not benchmark +> fidelity**. + +## When to use the harness + +- **Smoke check before a release** — `CurrentSpeedBenchmark -Dgraphcompose.benchmark.profile=smoke` + takes ~15 s, exercises the canonical render path through 5 fixture + scenarios, and prints a single-page latency / throughput table. + CI runs this on every PR (the `perf-smoke` job); the goal is "did + this PR make a representative render visibly slower?" — *not* "is + this number a publishable performance claim". + +- **Pre/post comparison on a single machine** — render a fixture + before and after a layout change, run `BenchmarkDiffTool` against + the two JSON reports, eyeball the delta. Variance per run is in + single-digit percent; treat deltas inside ±5 % as noise on the + default machine and tighten the threshold only when comparing on a + quiescent system with a fixed CPU frequency. + +- **Stress / endurance check** — `GraphComposeStressTest` and + `EnduranceTest` drive higher-cardinality fixtures over longer + windows to catch GC pressure spikes or memory leaks that a single + smoke run wouldn't surface. Run by hand; not on CI by default. + +## When **not** to use the harness + +- For a **published "X% faster than Y" claim** of any kind — the + numbers are not statistically rigorous and the comparison setup is + not reproducible across machines / JDKs. +- For **deciding between two architecturally different approaches** — + pick the right invariant (allocation count, big-O of the algorithm, + layout-pass count) and reason about it; the harness is a sanity + check after you've already chosen, not a decision tool before. +- For **comparing GraphCompose to another PDF library** — + `ComparativeBenchmark` does render the same fixture through iText / + openHTMLToPDF / JasperReports for rough sizing, but the comparison + is a manual smoke test: each library has different defaults + (compression, font embedding, image resampling) and reading too much + into a single number is the wrong call. + +## Files in this module + +| File | Role | +|---|---| +| `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. | +| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. | +| `FullCvBenchmark`, `ScalabilityBenchmark` | Fixture-specific runners for CV and table-heavy scenarios. | +| `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. | +| `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. | +| `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. | +| `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. | +| `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. | +| `GraphComposeBenchmark` | Legacy entry point preserved for one downstream caller. New work should target `CurrentSpeedBenchmark`. | + +## Running + +From the repo root: + +```bash +# Smoke profile (~15s) — what CI runs on every PR +./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \ + exec:java \ + -Dexec.mainClass=com.demcha.compose.CurrentSpeedBenchmark \ + -Dgraphcompose.benchmark.profile=smoke + +# Diff two existing report runs under the same scenario +./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \ + exec:java \ + -Dexec.mainClass=com.demcha.compose.BenchmarkDiffTool \ + -Dexec.args="current-speed" +``` + +Reports land in `benchmarks/target/benchmarks//`. The CI +`perf-smoke` job uploads the smoke directory as an artifact for every +PR run, so a regression can be diffed against the previous PR's run +without reproducing locally. + +## How to read a report + +The JSON shape is intentionally simple — a top-level run record with +per-scenario sub-records. Each sub-record carries: + +- `avgMs`, `p50Ms`, `p95Ms`, `maxMs` — latency distribution across + iterations within the run. +- `docsPerSec` — rough throughput; **not statistically rigorous**, + intended only as a relative number against a sibling scenario or a + previous run on the same machine. +- `avgKB` — average output byte size. Stable across runs on the same + fixture; useful for catching content corruption (size shifts by + > a few hundred bytes are usually a bug, not a benchmark fluctuation). +- `peakMB` — peak heap as observed by `MemoryMXBean`; coarse, do not + use for memory-budget enforcement. + +## Roadmap + +The 1.7.0 plan (Track C, B3 → B4 → B5 → B6) introduces a sibling JMH +layer: + +- **B3** — pull fixtures into a `fixtures/` package with deterministic + seeds so the JMH layer can reuse them. +- **B4** — JMH infrastructure (`jmh-core`, `jmh-generator-annprocess`, + shade plugin) + first benchmark (`SimpleDocumentJmhBenchmark`). +- **B5** — Invoice / CV / LargeTable / PdfRender JMH benchmarks. +- **B6** — CI job that runs the JMH layer on a `workflow_dispatch` / + weekly cadence and uploads `*.json` reports as artifacts. + +Once that chain is in place, any *public* performance claim should +quote the JMH layer's numbers, with explicit warmup / measurement / +fork configuration in the source. This manual harness will stay for +the smoke / diff / endurance roles described above. + +--- + +*This page is the source of truth for what the manual benchmark layer +is and is not. When in doubt — and especially before quoting a number +in a public communication — re-read the "When not to use" section.*