Skip to content

perf(engine): measurement font caching + glyph-coverage memo (F3+F4)#143

Merged
DemchaAV merged 2 commits into
developfrom
perf/font-measurement-glyph-cache
Jun 8, 2026
Merged

perf(engine): measurement font caching + glyph-coverage memo (F3+F4)#143
DemchaAV merged 2 commits into
developfrom
perf/font-measurement-glyph-cache

Conversation

@DemchaAV

@DemchaAV DemchaAV commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Two byte-identical hot-path optimizations on the canonical font / measurement / render path. Each removes real repeated work; both are verified output-identical against the visual + snapshot suite and a width-parity check. Performance figures come from deterministic probes/counters, not wall-clock.

F4 — measurement no longer embeds binary fonts into a throwaway document

PdfMeasurementResources used to subset-embed every Google/custom font family into a per-session PDDocument that was immediately discarded — repeated on every new DocumentSession (one per server render). Binary families now resolve to a per-thread cached PDType0Font bound to a reusable, never-saved document; opening measurement resources owns no document at all.

  • Byte-identical widths / metrics — both paths read the same parsed TrueTypeFont. MeasurementFontParityTest checks 30 families × 4 faces, max |Δ| = 0.
  • Per-open embed waste −94…−97% alloc / −80…−99% time (FontEmbedProbe, warm median). Google/custom-font documents only; standard-14 unaffected.
  • The thread-local cache is uncapped on purpose: an LRU would force a reload that re-embeds into the never-pruned measurement document, growing it instead of bounding it.

F3 — glyph coverage is memoized instead of re-probed per glyph

GlyphFallbackLogger.sanitize (shared by paragraph spans, table cells, watermark + header/footer chrome, and width measurement) called PDFont.encode for every code point of every string — a String allocation per glyph and a thrown+caught exception per unencodable glyph — at measurement and again at render. Coverage is now memoized per (font, code point): encode runs once per distinct glyph, then a map lookup; kept glyphs append by code point with no per-glyph String.

  • Byte-identical output — same encode decision, only cached; the glyph-fallback warn cadence is unchanged. Pinned by PdfFontSanitizerTest output assertions plus memo tests (4 probes for banana banana, 0 on repeat) counted via a test-scope counting font — no instrumentation in the production class.

Verification

  • ./mvnw verify -pl . — BUILD SUCCESS, 1149 tests, 0 failures (checkstyle + SpotBugs + javadoc).
  • benchmarks module 28 tests; examples smoke 1 test (regenerates all example PDFs); full JMH suite (-foe true) exit 0, no render errors.

Follow-ups (separate PRs)

  • F3c — carry the sanitized span/cell text to render to drop the second sanitize pass.
  • F3 zero-alloc — per-font BitSet-pair glyph coverage to remove the residual autobox on non-ASCII.

DemchaAV added 2 commits June 8, 2026 23:04
…embed

Measurement used to subset-embed every binary (Google/custom) font family into a
per-session PDDocument that was immediately discarded, repeated on every new
DocumentSession (one per server render). Resolve binary families to a per-thread
cached PDType0Font bound to a reusable, never-saved document instead, so a face
embeds once per worker thread; PdfMeasurementResources no longer owns a document.

Widths, vertical metrics and glyph coverage stay byte-identical to the render
font (both read the same parsed TrueTypeFont), proven by MeasurementFontParityTest
(30 families x 4 faces, max|delta| = 0) and the visual/snapshot suite. The
per-open embed waste drops ~94-97% (FontEmbedProbe). Standard-14-only documents
are unaffected.

Finding 4.
… glyph

GlyphFallbackLogger.sanitize (shared by paragraph spans, table cells, watermark
and header/footer chrome, and by width measurement) called PDFont.encode for
every code point of every string, allocating a String per glyph and throwing a
caught exception per unencodable glyph, at measurement and again at render.

Memoize coverage per (font, code point): encode runs once per distinct glyph,
then a map lookup; kept glyphs append by code point with no per-glyph String.
Output is byte-identical (same encode decision, cached; warn cadence unchanged),
pinned by PdfFontSanitizerTest output assertions plus new memo tests (4 probes
for "banana banana", 0 on repeat, counted via a test-scope counting font).

Finding 3.
@DemchaAV DemchaAV merged commit 8e895f6 into develop Jun 8, 2026
11 checks passed
@DemchaAV DemchaAV deleted the perf/font-measurement-glyph-cache branch June 8, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant