Skip to content

Conversation

@jerry-024
Copy link
Contributor

@jerry-024 jerry-024 commented Jan 9, 2026

Purpose

support global index which is faiss vector index search in python

Tests

  • TestFaissVectorGlobalIndexE2E
  • JavaPyFaissE2ETest

API and Format

Documentation

@jerry-024 jerry-024 marked this pull request as draft January 9, 2026 10:04
@jerry-024 jerry-024 force-pushed the support_global_index_python branch 2 times, most recently from 01e683d to 03261b9 Compare January 13, 2026 06:58
@jerry-024 jerry-024 force-pushed the support_global_index_python branch from 03261b9 to 4acbdee Compare January 13, 2026 07:03
@jerry-024 jerry-024 force-pushed the support_global_index_python branch from 5f63a9b to 02f3e51 Compare January 13, 2026 07:22
@jerry-024 jerry-024 force-pushed the support_global_index_python branch 2 times, most recently from de81c5a to 3e375d5 Compare January 13, 2026 08:59
@jerry-024 jerry-024 force-pushed the support_global_index_python branch 2 times, most recently from dc2279c to 1fa9ae9 Compare January 13, 2026 10:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for global index functionality in the Python client, specifically implementing FAISS vector search capabilities. The implementation enables Python to read FAISS vector indexes created by Java and perform vector similarity searches.

Changes:

  • Implements global index infrastructure including readers, evaluators, and scan builders
  • Adds FAISS vector index support with reader and index wrappers
  • Extends table scanning and reading to support vector searches with row filtering
  • Introduces supporting data structures (RoaringBitmap64, Range, IndexedSplit)

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
paimon-python/pypaimon/utils/file_store_path_factory.py Adds IndexPathFactory for managing global index file paths
paimon-python/pypaimon/tests/test_global_index.py E2E tests for FAISS vector index functionality
paimon-python/pypaimon/table/file_store_table.py Adds global index scan builder factory method
paimon-python/pypaimon/read/table_scan.py Integrates vector search into table scanning
paimon-python/pypaimon/read/table_read.py Implements row range filtering for indexed splits
paimon-python/pypaimon/read/split.py Refactors Split as class with base interface
paimon-python/pypaimon/read/scanner/full_starting_scanner.py Adds global index evaluation and split wrapping
paimon-python/pypaimon/read/read_builder.py Adds vector search configuration method
paimon-python/pypaimon/manifest/index_manifest_file.py Supports reading both Avro and JSON index manifests with global index metadata
paimon-python/pypaimon/index/index_file_meta.py Updates to include global index metadata
paimon-python/pypaimon/index/index_file_handler.py Handler for scanning index manifest entries
paimon-python/pypaimon/globalindex/vector_search_result.py Vector search result with score tracking
paimon-python/pypaimon/globalindex/vector_search.py VectorSearch query object
paimon-python/pypaimon/globalindex/roaring_bitmap.py RoaringBitmap64 implementation for row ID sets
paimon-python/pypaimon/globalindex/range.py Range utilities for row ID ranges
paimon-python/pypaimon/globalindex/indexed_split.py Split wrapper with row ranges and scores
paimon-python/pypaimon/globalindex/global_index_scan_builder_impl.py Implementation of global index scan builder
paimon-python/pypaimon/globalindex/global_index_scan_builder.py Builder interface and parallel scanning
paimon-python/pypaimon/globalindex/global_index_result.py Result container for global index queries
paimon-python/pypaimon/globalindex/global_index_reader.py Reader interface for global indexes
paimon-python/pypaimon/globalindex/global_index_meta.py Metadata classes for global indexes
paimon-python/pypaimon/globalindex/global_index_evaluator.py Evaluator for filtering with global indexes
paimon-python/pypaimon/globalindex/faiss/faiss_vector_reader.py FAISS vector index reader implementation
paimon-python/pypaimon/globalindex/faiss/faiss_options.py FAISS configuration options
paimon-python/pypaimon/globalindex/faiss/faiss_index_meta.py FAISS index metadata serialization
paimon-python/pypaimon/globalindex/faiss/faiss_index.py FAISS index wrapper with multiple index types
paimon-python/pypaimon/common/options/core_options.py Adds global index and vector search configuration options
paimon-python/dev/run_mixed_tests.sh Adds FAISS vector index E2E test
paimon-python/dev/requirements.txt Adds faiss-cpu dependency
paimon-faiss/.../JavaPyFaissE2ETest.java Java test for writing FAISS indexes
.github/workflows/utitcase.yml Refactors to use reusable FAISS native build workflow
.github/workflows/paimon-python-checks.yml Integrates FAISS native library build
.github/workflows/build-faiss-native.yml Reusable workflow for building FAISS native libraries
Comments suppressed due to low confidence (2)

paimon-python/pypaimon/read/split.py:1

  • The refactoring from dataclass to explicit class constructor changes the initialization patterns. Ensure all call sites are updated to use named parameters instead of positional arguments. The old dataclass allowed field(default_factory=dict) which is now replaced with or {} logic in the constructor.
    paimon-python/pypaimon/read/table_read.py:1
  • The row filtering uses a Python loop checking membership in a set for each row. For large batches, this could be inefficient. Consider vectorizing this operation using numpy operations if allowed_row_ids is converted to a numpy array or using pyarrow compute functions for better performance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jerry-024 jerry-024 marked this pull request as ready for review January 14, 2026 03:06
@jerry-024 jerry-024 force-pushed the support_global_index_python branch from f0b79be to a69bab1 Compare January 14, 2026 03:19
@apache apache deleted a comment from Copilot AI Jan 14, 2026
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 4bde8fc into apache:master Jan 14, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants