-
Notifications
You must be signed in to change notification settings - Fork 1.3k
support global index in python #6995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support global index in python #6995
Conversation
01e683d to
03261b9
Compare
03261b9 to
4acbdee
Compare
5f63a9b to
02f3e51
Compare
de81c5a to
3e375d5
Compare
dc2279c to
1fa9ae9
Compare
0ff6acc to
f0b79be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for global index functionality in the Python client, specifically implementing FAISS vector search capabilities. The implementation enables Python to read FAISS vector indexes created by Java and perform vector similarity searches.
Changes:
- Implements global index infrastructure including readers, evaluators, and scan builders
- Adds FAISS vector index support with reader and index wrappers
- Extends table scanning and reading to support vector searches with row filtering
- Introduces supporting data structures (RoaringBitmap64, Range, IndexedSplit)
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| paimon-python/pypaimon/utils/file_store_path_factory.py | Adds IndexPathFactory for managing global index file paths |
| paimon-python/pypaimon/tests/test_global_index.py | E2E tests for FAISS vector index functionality |
| paimon-python/pypaimon/table/file_store_table.py | Adds global index scan builder factory method |
| paimon-python/pypaimon/read/table_scan.py | Integrates vector search into table scanning |
| paimon-python/pypaimon/read/table_read.py | Implements row range filtering for indexed splits |
| paimon-python/pypaimon/read/split.py | Refactors Split as class with base interface |
| paimon-python/pypaimon/read/scanner/full_starting_scanner.py | Adds global index evaluation and split wrapping |
| paimon-python/pypaimon/read/read_builder.py | Adds vector search configuration method |
| paimon-python/pypaimon/manifest/index_manifest_file.py | Supports reading both Avro and JSON index manifests with global index metadata |
| paimon-python/pypaimon/index/index_file_meta.py | Updates to include global index metadata |
| paimon-python/pypaimon/index/index_file_handler.py | Handler for scanning index manifest entries |
| paimon-python/pypaimon/globalindex/vector_search_result.py | Vector search result with score tracking |
| paimon-python/pypaimon/globalindex/vector_search.py | VectorSearch query object |
| paimon-python/pypaimon/globalindex/roaring_bitmap.py | RoaringBitmap64 implementation for row ID sets |
| paimon-python/pypaimon/globalindex/range.py | Range utilities for row ID ranges |
| paimon-python/pypaimon/globalindex/indexed_split.py | Split wrapper with row ranges and scores |
| paimon-python/pypaimon/globalindex/global_index_scan_builder_impl.py | Implementation of global index scan builder |
| paimon-python/pypaimon/globalindex/global_index_scan_builder.py | Builder interface and parallel scanning |
| paimon-python/pypaimon/globalindex/global_index_result.py | Result container for global index queries |
| paimon-python/pypaimon/globalindex/global_index_reader.py | Reader interface for global indexes |
| paimon-python/pypaimon/globalindex/global_index_meta.py | Metadata classes for global indexes |
| paimon-python/pypaimon/globalindex/global_index_evaluator.py | Evaluator for filtering with global indexes |
| paimon-python/pypaimon/globalindex/faiss/faiss_vector_reader.py | FAISS vector index reader implementation |
| paimon-python/pypaimon/globalindex/faiss/faiss_options.py | FAISS configuration options |
| paimon-python/pypaimon/globalindex/faiss/faiss_index_meta.py | FAISS index metadata serialization |
| paimon-python/pypaimon/globalindex/faiss/faiss_index.py | FAISS index wrapper with multiple index types |
| paimon-python/pypaimon/common/options/core_options.py | Adds global index and vector search configuration options |
| paimon-python/dev/run_mixed_tests.sh | Adds FAISS vector index E2E test |
| paimon-python/dev/requirements.txt | Adds faiss-cpu dependency |
| paimon-faiss/.../JavaPyFaissE2ETest.java | Java test for writing FAISS indexes |
| .github/workflows/utitcase.yml | Refactors to use reusable FAISS native build workflow |
| .github/workflows/paimon-python-checks.yml | Integrates FAISS native library build |
| .github/workflows/build-faiss-native.yml | Reusable workflow for building FAISS native libraries |
Comments suppressed due to low confidence (2)
paimon-python/pypaimon/read/split.py:1
- The refactoring from dataclass to explicit class constructor changes the initialization patterns. Ensure all call sites are updated to use named parameters instead of positional arguments. The old dataclass allowed field(default_factory=dict) which is now replaced with
or {}logic in the constructor.
paimon-python/pypaimon/read/table_read.py:1 - The row filtering uses a Python loop checking membership in a set for each row. For large batches, this could be inefficient. Consider vectorizing this operation using numpy operations if allowed_row_ids is converted to a numpy array or using pyarrow compute functions for better performance.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paimon-python/pypaimon/globalindex/faiss/faiss_vector_reader.py
Outdated
Show resolved
Hide resolved
f0b79be to
a69bab1
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
support global index which is faiss vector index search in python
Tests
API and Format
Documentation