⚡️ Speed up function boxes_self_iou by 79%
#49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 79% (0.79x) speedup for
boxes_self_iouinunstructured/partition/pdf_image/pdfminer_processing.py⏱️ Runtime :
6.11 milliseconds→3.41 milliseconds(best of31runs)📝 Explanation and details
The optimized code achieves a 79% speedup by replacing NumPy's vectorized operations with Numba-compiled JIT functions for the core IoU computation.
Key Optimizations:
Numba JIT Compilation: The critical
areas_of_boxes_and_intersection_areafunction is replaced with_areas_of_boxes_and_intersection_area_numbaand_boxes_iou_numba, both decorated with@njit(cache=True, fastmath=True). This compiles the functions to native machine code, eliminating Python interpreter overhead.Explicit Loop Implementation: Instead of NumPy's vectorized operations with array broadcasting and transpose operations, the optimized version uses explicit nested loops. While this seems counterintuitive, Numba makes these loops extremely fast while avoiding the memory allocation overhead of intermediate arrays.
Memory Efficiency: The explicit loops avoid creating large intermediate arrays that NumPy's vectorized operations would generate (like
boxb_area.Tand broadcast operations), reducing memory pressure and cache misses.Type Consistency: The code ensures float64 compatibility for Numba functions, converting input arrays when necessary.
Performance Impact:
The optimization particularly benefits scenarios with frequent IoU calculations on moderate-sized bounding box sets, where the overhead of NumPy's array operations and memory allocations becomes significant compared to Numba's optimized machine code execution.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
partition/pdf_image/test_pdfminer_processing.py::test_boxes_self_iou🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-boxes_self_iou-mjdenx7eand push.