⚡️ Speed up function pad_element_bboxes by 548%
#70
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 548% (5.48x) speedup for
pad_element_bboxesinunstructured/partition/pdf_image/pdf_image_utils.py⏱️ Runtime :
3.60 milliseconds→556 microseconds(best of16runs)📝 Explanation and details
The optimized code achieves a 547% speedup by eliminating the expensive
deepcopyoperation that dominated 97% of the original runtime. Here are the key optimizations:Primary Optimization - Eliminated Deep Copy:
deepcopy(element)with manual object construction usingtype(element).__new__()and__dict__.update()deepcopyperforms on the entire object graphdeepcopytook 22.6ms out of 23.2ms total time in the originalSecondary Optimization - Numba JIT Compilation:
@numba.njit(cache=True)decorator to_pad_bbox_numba()for the arithmetic operationsObject Construction Strategy:
__dict__and replacing only the bbox fieldPerformance Results:
The test cases show consistent 300-600% speedups across all scenarios:
This optimization is particularly valuable for batch processing operations where
pad_element_bboxesis called repeatedly, as the per-call overhead reduction from ~3.6ms to ~0.56ms can compound significantly in document processing pipelines.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
partition/pdf_image/test_ocr.py::test_pad_element_bboxes🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pad_element_bboxes-mjefvnyzand push.