some perf and documentation changes#355
Conversation
hackaugusto
left a comment
There was a problem hiding this comment.
Left some comments to give context on the changes, this turned out to be a bigger change than I anticipated.
| pub(super) frames: Vec<TimedFrame<'a>>, | ||
| pub(super) accumulated_samples: usize, | ||
| pub(super) delta_max: usize, | ||
| } |
There was a problem hiding this comment.
I replaced the HashMap with a Vec stack, because that seems to be how flow works (it pops all values that were in previous but are no longer in current, and pushes the new values from current). This meant there was no need for the Frame to act as key, so we can move all the fields to TimedFrame. This had a small perf improvement.
| open_frames: &mut Vec<TimedFrame<'a>>, | ||
| closed_frames: &mut Vec<TimedFrame<'a>>, | ||
| previous_frames: &[&'a str], | ||
| current_frames: &[&'a str], |
There was a problem hiding this comment.
Changing the inputs from iterator to slices gave a few benefits:
- the caller
framescan split the results by;once, and cache it for the next iteration - this function can slice into the frames, and use
extendwhich seems to give a small perf improvement
| let mut delta = None; | ||
| let mut delta_max = 1; | ||
| let mut stripped_fractional_samples = false; | ||
| let mut prev_line = None; |
There was a problem hiding this comment.
the idea was to make frames into an Iterator, so I was dropping some of the stack variables to make the to-be-implemented FramesIterator smaller.
The two main changes here are:
open_framesandclosed_framesareVecs of the same type, as opposed totmpandframeswhich were a hashmap and a vec. This is because theflownow usesopen_framesas a stack of the shared frames, and pushes the finalized frames toclosed_frames- the code caches the parsed line into the
previousvariable, and thepreviousandcurrentvecs are reused to save allocations. this also removed the need to special case the first and last iteration with aNonefor an iterator.
| ); | ||
| current.extend(iter::once("").chain(line.split(';'))); | ||
|
|
||
| if !SUPPRESS_SORT_CHECK { |
There was a problem hiding this comment.
making this const gave a small perf improvement
| reversed.push(stack); | ||
| } | ||
| let mut reversed: Vec<&str> = reversed.iter().collect(); | ||
| reversed.sort_unstable(); |
There was a problem hiding this comment.
Moved this to an utility, the main change is that instead of using StrStack the function returns a Vec<String>, the idea is that reserved.push(stack) is copying the data an maybe reallocating, as more data gets added the cost of reallocating increases, not sure if it is worth it though.
| @@ -398,54 +400,35 @@ where | |||
| I: IntoIterator<Item = &'a str>, | |||
There was a problem hiding this comment.
this is the biggest blocker, to make this work with big files at least we would need to make a change to the public api and add + Clone, so we can do two passes.
I still would like to add an API to handle big files, in my case it would work to have another API that receives parsed entries with the total time and delta max. What do you think?
There was a problem hiding this comment.
To add to the API idea, I was thinking about something like this:
- for a text format:
document := (comment | line)*
comment := "#" [^\n\r]*
line := <depth> <frame> <pct_value> <pct_delta>?
the rules would be, depth either increases by 1 or decreases up to current depth, and the pct_* have to add to 100. So something like:
a; 2
a;b;c 16
a;b 2
would become:
1 a 10
2 b 0
3 c 80
2 b 10
and the programmatic API would take an IntoIterator<Item=&Entry> where:
struct Entry<'a> {
pub depth: usize,
pub function: &'a str,
pub pct_value: usize,
pub pct_delta: Option<usize>,
}
I performed this changes while reading the code and trying to implement #342, unfortunately this doesn't achieve that, but it does add a 5% to 10% perf gain (criterion runs vary a lot on my machine), and it adds some docs and hopefully some nice small improvements.