Skip to content

some perf and documentation changes#355

Open
hackaugusto wants to merge 20 commits intojonhoo:mainfrom
hackaugusto:improvements
Open

some perf and documentation changes#355
hackaugusto wants to merge 20 commits intojonhoo:mainfrom
hackaugusto:improvements

Conversation

@hackaugusto
Copy link

I performed this changes while reading the code and trying to implement #342, unfortunately this doesn't achieve that, but it does add a 5% to 10% perf gain (criterion runs vary a lot on my machine), and it adds some docs and hopefully some nice small improvements.

Copy link
Author

@hackaugusto hackaugusto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments to give context on the changes, this turned out to be a bigger change than I anticipated.

pub(super) frames: Vec<TimedFrame<'a>>,
pub(super) accumulated_samples: usize,
pub(super) delta_max: usize,
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the HashMap with a Vec stack, because that seems to be how flow works (it pops all values that were in previous but are no longer in current, and pushes the new values from current). This meant there was no need for the Frame to act as key, so we can move all the fields to TimedFrame. This had a small perf improvement.

open_frames: &mut Vec<TimedFrame<'a>>,
closed_frames: &mut Vec<TimedFrame<'a>>,
previous_frames: &[&'a str],
current_frames: &[&'a str],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the inputs from iterator to slices gave a few benefits:

  • the caller frames can split the results by ; once, and cache it for the next iteration
  • this function can slice into the frames, and use extend which seems to give a small perf improvement

let mut delta = None;
let mut delta_max = 1;
let mut stripped_fractional_samples = false;
let mut prev_line = None;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to make frames into an Iterator, so I was dropping some of the stack variables to make the to-be-implemented FramesIterator smaller.

The two main changes here are:

  • open_frames and closed_frames are Vecs of the same type, as opposed to tmp and frames which were a hashmap and a vec. This is because the flow now uses open_frames as a stack of the shared frames, and pushes the finalized frames to closed_frames
  • the code caches the parsed line into the previous variable, and the previousand current vecs are reused to save allocations. this also removed the need to special case the first and last iteration with a None for an iterator.

);
current.extend(iter::once("").chain(line.split(';')));

if !SUPPRESS_SORT_CHECK {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this const gave a small perf improvement

reversed.push(stack);
}
let mut reversed: Vec<&str> = reversed.iter().collect();
reversed.sort_unstable();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to an utility, the main change is that instead of using StrStack the function returns a Vec<String>, the idea is that reserved.push(stack) is copying the data an maybe reallocating, as more data gets added the cost of reallocating increases, not sure if it is worth it though.

@@ -398,54 +400,35 @@ where
I: IntoIterator<Item = &'a str>,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the biggest blocker, to make this work with big files at least we would need to make a change to the public api and add + Clone, so we can do two passes.

I still would like to add an API to handle big files, in my case it would work to have another API that receives parsed entries with the total time and delta max. What do you think?

Copy link
Author

@hackaugusto hackaugusto Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to the API idea, I was thinking about something like this:

  1. for a text format:
document := (comment | line)*
comment := "#" [^\n\r]*
line := <depth> <frame> <pct_value> <pct_delta>?

the rules would be, depth either increases by 1 or decreases up to current depth, and the pct_* have to add to 100. So something like:

a; 2
a;b;c 16
a;b 2

would become:

1 a 10
2 b 0
3 c 80
2 b 10

and the programmatic API would take an IntoIterator<Item=&Entry> where:

struct Entry<'a> {
    pub depth: usize,
    pub function: &'a str,
    pub pct_value: usize,
    pub pct_delta: Option<usize>,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant