perf: persistent buffer with colored pens#16
Merged
Conversation
Avoid duplication of effort for cube and visited computation and allocation respectively.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allocating and zeroing the visit buffer can be a significant cost. Even though most of the algorithm doesn't touch most of the volume, zeroing requires touching every voxel, meaning each query on a given shape is O(V * K) where V is the number of voxels and K is the number of queries. Since we were using a std::vector that reduced the cost 8x, and when we implemented 2x2x2, that reduced the cost 64x, but we can do better.
Instead of zeroing the buffer each time, let's use std::vector<uint8_t>, zero it to start, and then draw in a different color for each visit. Then we only need to zero it once every 255 times.
The memory grows 8x vs std::vector, but the amortized cost of zeroing is now 31.9 times less (255 / 8).
This behavior will be opt in as you need to intentionally allocate and deallocate the buffer. If you forget, the buffer will automatically grow as needed, but you'll have a memory leak.
I also tried experimenting with pre-computing all the "cubes", but since most of the volume is usually untouched, and most planes along a skeleton don't intersect much, there is a negative impact to doing so.