You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The function src/distance/edge.hpp::found::ConnectedComponentsAlgorithm takes a long time to run on a standard image. This is partly due to:
Use of standard template objects (std::map, std::unordered_map, etc.) that both allocate heap data AND also do complex things with it.
The nature of the return type, which is an std::vector whose size does not need to be added to after.
Each element of the return type is a group of connected pixels. It is a struct that contains:
a) The "top-left" and "bottom-right", or the pixels that define a box that contains all the pixels in the group. These are 2 Vec2, which is inefficient because all pixel locations can be defined via uint64_t as the x and y coordinates, and not decimal (Floating Point types take a lot more math to store and do calculations around).
b) An unordered_set (hash set) of all the pixels as Vec2. Same inefficiency.
Note
For context, componentPoints used to be a std::map that mapped from pixel index to its group number (L). Of course, not every pixel will have a value (because not all pixels belong to a group), but by instead using a size-constant heap array as the map (much bigger than the internal memory used by std::map, the runtime decreased by half, even though memory usage went up.
Potential Solutions
1) Standard Template Objects
At the beginning of the algorithm we define 3 types:
components: Maps from group number to group (from return type)
equivalencies: Maps group numbers to equivalent group numbers.
componentPoints: Maps from pixel index to group number (already optimized).
The first two fields can be optimized. You will notice that their maximum capacity would be the number of possible groups, and the key is always the group number (at the moment, the group number starts at 1, but that can always be changed to be zero based. I don't recommend changing this so you take advantage of zero initialization). Thus, you can convert both fields to arrays + size trackers if you know their maximum capacity. You may conduct a proof on this, but the maximum size would be the ceiling of image area / 4, or
$$\left\lceil \frac{wh}{4} \right\rceil$$
where $w$ and $h$ are the image width and height.
Tip
You don't need to invoke std::ceil for this. Think of how you can natively use / and % operators together.
2) Returning an std::vector with non-increasing size
This eliminates need for a vector really, and if the user doesn't change the size of the array, we should just return a struct with the array and its size (still has to be heap-allocated). This could be integrated with the previous step!
3) Return Type's Element Type
A few optimizations can be performed here:
Make an internal struct definition for a Vec2 (do not call it that) that uses uint32_t as the underlying type and substitute it throughout the return type. Also provide the maximum size to std::unordered_set (the image's area) via std::unordered_set::reserve(size_t count)
Eliminate the std::unordered_set from the return type and just return the std::moveed componentPoints along with the metadata on each group. You will need to modify the return type in this case to be a struct, and properly set equivalent groups in a parallel manner that's being done in the algorithm within the componentPoints.
Description
The function
src/distance/edge.hpp::found::ConnectedComponentsAlgorithmtakes a long time to run on a standard image. This is partly due to:std::map,std::unordered_map, etc.) that both allocate heap data AND also do complex things with it.std::vectorwhose size does not need to be added to after.structthat contains:a) The "top-left" and "bottom-right", or the pixels that define a box that contains all the pixels in the group. These are 2 Vec2, which is inefficient because all pixel locations can be defined via
uint64_tas the x and y coordinates, and notdecimal(Floating Point types take a lot more math to store and do calculations around).b) An
unordered_set(hash set) of all the pixels asVec2. Same inefficiency.Note
For context,
componentPointsused to be astd::mapthat mapped from pixel index to its group number (L). Of course, not every pixel will have a value (because not all pixels belong to a group), but by instead using a size-constant heap array as the map (much bigger than the internal memory used bystd::map, the runtime decreased by half, even though memory usage went up.Potential Solutions
1) Standard Template Objects
At the beginning of the algorithm we define 3 types:
components: Maps from group number to group (from return type)equivalencies: Maps group numbers to equivalent group numbers.componentPoints: Maps from pixel index to group number (already optimized).The first two fields can be optimized. You will notice that their maximum capacity would be the number of possible groups, and the key is always the group number (at the moment, the group number starts at 1, but that can always be changed to be zero based. I don't recommend changing this so you take advantage of zero initialization). Thus, you can convert both fields to arrays + size trackers if you know their maximum capacity. You may conduct a proof on this, but the maximum size would be the ceiling of image area / 4, or
where$w$ and $h$ are the image width and height.
Tip
You don't need to invoke
std::ceilfor this. Think of how you can natively use/and%operators together.2) Returning an
std::vectorwith non-increasing sizeThis eliminates need for a vector really, and if the user doesn't change the size of the array, we should just return a struct with the array and its size (still has to be heap-allocated). This could be integrated with the previous step!
3) Return Type's Element Type
A few optimizations can be performed here:
structdefinition for aVec2(do not call it that) that usesuint32_tas the underlying type and substitute it throughout the return type. Also provide the maximum size tostd::unordered_set(the image's area) viastd::unordered_set::reserve(size_t count)std::unordered_setfrom the return type and just return thestd::moveedcomponentPointsalong with the metadata on each group. You will need to modify the return type in this case to be astruct, and properly set equivalent groups in a parallel manner that's being done in the algorithm within thecomponentPoints.