perf(engine): measurement font caching + glyph-coverage memo (F3+F4)#143
Merged
Conversation
…embed Measurement used to subset-embed every binary (Google/custom) font family into a per-session PDDocument that was immediately discarded, repeated on every new DocumentSession (one per server render). Resolve binary families to a per-thread cached PDType0Font bound to a reusable, never-saved document instead, so a face embeds once per worker thread; PdfMeasurementResources no longer owns a document. Widths, vertical metrics and glyph coverage stay byte-identical to the render font (both read the same parsed TrueTypeFont), proven by MeasurementFontParityTest (30 families x 4 faces, max|delta| = 0) and the visual/snapshot suite. The per-open embed waste drops ~94-97% (FontEmbedProbe). Standard-14-only documents are unaffected. Finding 4.
… glyph GlyphFallbackLogger.sanitize (shared by paragraph spans, table cells, watermark and header/footer chrome, and by width measurement) called PDFont.encode for every code point of every string, allocating a String per glyph and throwing a caught exception per unencodable glyph, at measurement and again at render. Memoize coverage per (font, code point): encode runs once per distinct glyph, then a map lookup; kept glyphs append by code point with no per-glyph String. Output is byte-identical (same encode decision, cached; warn cadence unchanged), pinned by PdfFontSanitizerTest output assertions plus new memo tests (4 probes for "banana banana", 0 on repeat, counted via a test-scope counting font). Finding 3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two byte-identical hot-path optimizations on the canonical font / measurement / render path. Each removes real repeated work; both are verified output-identical against the visual + snapshot suite and a width-parity check. Performance figures come from deterministic probes/counters, not wall-clock.
F4 — measurement no longer embeds binary fonts into a throwaway document
PdfMeasurementResourcesused to subset-embed every Google/custom font family into a per-sessionPDDocumentthat was immediately discarded — repeated on every newDocumentSession(one per server render). Binary families now resolve to a per-thread cachedPDType0Fontbound to a reusable, never-saved document; opening measurement resources owns no document at all.TrueTypeFont.MeasurementFontParityTestchecks 30 families × 4 faces, max |Δ| = 0.FontEmbedProbe, warm median). Google/custom-font documents only; standard-14 unaffected.F3 — glyph coverage is memoized instead of re-probed per glyph
GlyphFallbackLogger.sanitize(shared by paragraph spans, table cells, watermark + header/footer chrome, and width measurement) calledPDFont.encodefor every code point of every string — aStringallocation per glyph and a thrown+caught exception per unencodable glyph — at measurement and again at render. Coverage is now memoized per(font, code point):encoderuns once per distinct glyph, then a map lookup; kept glyphs append by code point with no per-glyphString.encodedecision, only cached; the glyph-fallback warn cadence is unchanged. Pinned byPdfFontSanitizerTestoutput assertions plus memo tests (4 probes forbanana banana, 0 on repeat) counted via a test-scope counting font — no instrumentation in the production class.Verification
./mvnw verify -pl .— BUILD SUCCESS, 1149 tests, 0 failures (checkstyle + SpotBugs + javadoc).-foe true) exit 0, no render errors.Follow-ups (separate PRs)
BitSet-pair glyph coverage to remove the residual autobox on non-ASCII.