⚡️ Speed up method VertexAIEmbeddingEncoder._add_embeddings_to_elements by 85%
#62
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 85% (0.85x) speedup for
VertexAIEmbeddingEncoder._add_embeddings_to_elementsinunstructured/embed/vertexai.py⏱️ Runtime :
195 microseconds→105 microseconds(best of250runs)📝 Explanation and details
The optimization achieves an 85% speedup by eliminating the need for manual indexing and list building. The key changes are:
What was optimized:
enumerate()withzip()- Instead offor i, element in enumerate(elements)followed byembeddings[i], the code now usesfor element, embedding in zip(elements, embeddings)to iterate over both collections simultaneouslyelements_w_embedding = []list and.append()operations since the function mutates elements in-place and returns the originalelementslistWhy this is faster:
embeddings[i]lookup for each iteration, which requires bounds checking and index calculation.zip()provides direct element access without indexingelements_w_embeddingadded ~35.6% of the original runtime overhead according to the profilerzip()creates an iterator that processes elements sequentially without additional memory allocationsPerformance impact based on test results:
The optimization is particularly effective for larger datasets, which is important since embedding operations typically process batches of documents. The function maintains identical behavior - elements are still mutated in-place and the same list is returned.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-VertexAIEmbeddingEncoder._add_embeddings_to_elements-mje14as7and push.