Improve tlparse runtime by caching format_stack results by yushangdi · Pull Request #173 · meta-pytorch/tlparse

yushangdi · 2026-03-12T00:05:59Z

Summary

Cache expensive repeated computations in compilation_metrics and inductor_output_code parsers to significantly reduce parsing time on large log files.

Changes

Cache regex in simplify_filename (types.rs): Replace per-call Regex::new() with a static Lazy<Regex>, eliminating ~3000 regex compilations per run.
Cache syntect resources (parsers.rs): Load SyntaxSet and ThemeSet once via OnceLock instead of on every generate_html_output call.
Cache format_stack results (parsers.rs): Deduplicate repeated format_stack calls using a FxHashMap<(StackSummary, String), String> cache. Reduces ~3000 expensive trie-building calls to ~20 unique computations for the worst-case entries.

Benchmark (release build, 258MB log, 44,826 entries, 3 runs each) on my macbook air.

this log has a lot of symbols and guards added (300+)

Metric	Before	After	Improvement
Wall clock (median)	15.43s	9.27s	1.66x
CPU user time (median)	14.09s	7.04s	2.00x

Parser-level breakdown (after fixes):

Parser	Time	% of total
`inductor_output_code`	4.3s	57%
`compilation_metrics`	1.4s	19%
regex + json + I/O	1.4s	18%

Currently inductor_output_code takes a long time, and that's mostly spend on highlighting the syntax.

If we use --plain-text to not do highlighting:

Metric	Before	After	Improvement
Wall clock	10.72s	3.73s	2.87x
CPU user time	9.39s	2.53s	3.71x

Test plan

cargo test passes
Verified output file sets are identical between baseline and fixed builds (diff on output directories shows no differences)
Benchmarked on 258MB production log file with 44,826 structured log entries

aorenste

As a follow-on you could use rayon to do the syntax highlighting in parallel to speed it up quite a bit.

Approving w/ comments.

aorenste · 2026-03-12T01:14:07Z

src/parsers.rs

+    caption: &str,
+    open: bool,
+) -> String {
+    let key = (stack.clone(), caption.to_string());


You're not including open in the cache key? Should probably include it - but it looks like it's always false? So you could just remove it from the function.

good catch! removing it

aorenste · 2026-03-12T01:18:33Z

src/parsers.rs

+    caption: &str,
+    open: bool,
+) -> String {
+    let key = (stack.clone(), caption.to_string());


I don't like these clones, but unfortunately HashMap has a limitation that requires it for a tuple key. If we find that this is a perf bottleneck later we could switch to a 2-level cache (i.e. FxHashMap<StackSummary, FxHashMap<String, String>>) which wouldn't require us to clone the key - but that's proabably not necessary.

aorenste · 2026-03-12T01:23:25Z

src/parsers.rs

+        return cached.clone();
+    }
+    let result = format_stack(stack, caption, open);
+    cache.insert(key, result.clone());


We could avoid this clone by returning the copy of the String the cache contains as a &str - but again... probably overkill.

aorenste · 2026-03-12T01:24:37Z

src/parsers.rs

-                        false,
-                    ),
+                .map(|spec| {
+                    let user_stack = spec.user_stack.unwrap_or(Vec::new());


nit: only create the Vec if the unwrap fails (here and all the unwrap_or below):

Suggested change

let user_stack = spec.user_stack.unwrap_or(Vec::new());

let user_stack = spec.user_stack.unwrap_or_else(|| Vec::new());

…arsing

meta-cla bot added the cla signed label Mar 12, 2026

yushangdi force-pushed the sy_improve_runtime branch 3 times, most recently from e1aa79e to bcb3702 Compare March 12, 2026 00:56

yushangdi marked this pull request as ready for review March 12, 2026 01:03

yushangdi requested a review from ezyang March 12, 2026 01:03

yushangdi changed the title ~~Sy improve runtime~~ Improve tlparse runtime by caching format_stack results Mar 12, 2026

yushangdi requested a review from aorenste March 12, 2026 01:12

aorenste approved these changes Mar 12, 2026

View reviewed changes

Cache regex, syntect resources, and format_stack results for faster p…

a2ce87a

…arsing

yushangdi force-pushed the sy_improve_runtime branch from bcb3702 to a2ce87a Compare March 12, 2026 16:19

yushangdi merged commit 58d9c3c into main Mar 12, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tlparse runtime by caching format_stack results#173

Improve tlparse runtime by caching format_stack results#173
yushangdi merged 1 commit intomainfrom
sy_improve_runtime

yushangdi commented Mar 12, 2026 •

edited

Loading

Uh oh!

aorenste left a comment

Uh oh!

aorenste Mar 12, 2026

Uh oh!

yushangdi Mar 12, 2026

Uh oh!

aorenste Mar 12, 2026

Uh oh!

aorenste Mar 12, 2026

Uh oh!

aorenste Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	let user_stack = spec.user_stack.unwrap_or(Vec::new());
	let user_stack = spec.user_stack.unwrap_or_else(\|\| Vec::new());

Conversation

yushangdi commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benchmark (release build, 258MB log, 44,826 entries, 3 runs each) on my macbook air.

Test plan

Uh oh!

aorenste left a comment

Choose a reason for hiding this comment

Uh oh!

aorenste Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

yushangdi Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

aorenste Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

aorenste Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

aorenste Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yushangdi commented Mar 12, 2026 •

edited

Loading