Skip to content

Improve tlparse runtime by caching format_stack results#173

Merged
yushangdi merged 1 commit intomainfrom
sy_improve_runtime
Mar 12, 2026
Merged

Improve tlparse runtime by caching format_stack results#173
yushangdi merged 1 commit intomainfrom
sy_improve_runtime

Conversation

@yushangdi
Copy link
Contributor

@yushangdi yushangdi commented Mar 12, 2026

Summary

Cache expensive repeated computations in compilation_metrics and inductor_output_code parsers to significantly reduce parsing time on large log files.

Changes

  • Cache regex in simplify_filename (types.rs): Replace per-call Regex::new() with a static Lazy<Regex>, eliminating ~3000 regex compilations per run.
  • Cache syntect resources (parsers.rs): Load SyntaxSet and ThemeSet once via OnceLock instead of on every generate_html_output call.
  • Cache format_stack results (parsers.rs): Deduplicate repeated format_stack calls using a FxHashMap<(StackSummary, String), String> cache. Reduces ~3000 expensive trie-building calls to ~20 unique computations for the worst-case entries.

Benchmark (release build, 258MB log, 44,826 entries, 3 runs each) on my macbook air.

this log has a lot of symbols and guards added (300+)

Metric Before After Improvement
Wall clock (median) 15.43s 9.27s 1.66x
CPU user time (median) 14.09s 7.04s 2.00x

Parser-level breakdown (after fixes):

Parser Time % of total
inductor_output_code 4.3s 57%
compilation_metrics 1.4s 19%
regex + json + I/O 1.4s 18%

Currently inductor_output_code takes a long time, and that's mostly spend on highlighting the syntax.

If we use --plain-text to not do highlighting:

Metric Before After Improvement
Wall clock 10.72s 3.73s 2.87x
CPU user time 9.39s 2.53s 3.71x

Test plan

  • cargo test passes
  • Verified output file sets are identical between baseline and fixed builds (diff on output directories shows no differences)
  • Benchmarked on 258MB production log file with 44,826 structured log entries

@meta-cla meta-cla bot added the cla signed label Mar 12, 2026
@yushangdi yushangdi force-pushed the sy_improve_runtime branch 3 times, most recently from e1aa79e to bcb3702 Compare March 12, 2026 00:56
@yushangdi yushangdi marked this pull request as ready for review March 12, 2026 01:03
@yushangdi yushangdi requested a review from ezyang March 12, 2026 01:03
@yushangdi yushangdi changed the title Sy improve runtime Improve tlparse runtime by caching format_stack results Mar 12, 2026
@yushangdi yushangdi requested a review from aorenste March 12, 2026 01:12
Copy link
Contributor

@aorenste aorenste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-on you could use rayon to do the syntax highlighting in parallel to speed it up quite a bit.

Approving w/ comments.

caption: &str,
open: bool,
) -> String {
let key = (stack.clone(), caption.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not including open in the cache key? Should probably include it - but it looks like it's always false? So you could just remove it from the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! removing it

caption: &str,
open: bool,
) -> String {
let key = (stack.clone(), caption.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like these clones, but unfortunately HashMap has a limitation that requires it for a tuple key. If we find that this is a perf bottleneck later we could switch to a 2-level cache (i.e. FxHashMap<StackSummary, FxHashMap<String, String>>) which wouldn't require us to clone the key - but that's proabably not necessary.

return cached.clone();
}
let result = format_stack(stack, caption, open);
cache.insert(key, result.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid this clone by returning the copy of the String the cache contains as a &str - but again... probably overkill.

src/parsers.rs Outdated
false,
),
.map(|spec| {
let user_stack = spec.user_stack.unwrap_or(Vec::new());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: only create the Vec if the unwrap fails (here and all the unwrap_or below):

Suggested change
let user_stack = spec.user_stack.unwrap_or(Vec::new());
let user_stack = spec.user_stack.unwrap_or_else(|| Vec::new());

@yushangdi yushangdi force-pushed the sy_improve_runtime branch from bcb3702 to a2ce87a Compare March 12, 2026 16:19
@yushangdi yushangdi merged commit 58d9c3c into main Mar 12, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants