You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in issue #804, the system-level libarrow.so provided in standard manylinux environments (or installed via system package managers) is often incomplete or lacks necessary components for our use case.
A more robust solution is to link graphar.so directly against the libarrow.so bundled within the pyarrow python package. This ensures we are using a full-featured Arrow library that matches the Python environment.
However, adopting this approach introduces several significant build and runtime challenges described below.
The dependency relationship is illustrated as follows:
graph TD
A[pyarrow bundled libarrow.so] --> B[pyarrow.whl]
A --> C[graphar.so <br> C++ Core]
C --> D[graphar.whl <br> Python Binding]
B -.-> D
style A fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#bbf,stroke:#333,stroke-width:2px
Loading
Key Challenges
1. ABI Compatibility (The "Segfault" Risk)
C++ ABI (Application Binary Interface) is not guaranteed to be stable across different major versions of Apache Arrow.
Risk: If graphar.so is built against the libarrow.so from pyarrow v14.0.0, but the user updates to pyarrow v15.0.0 at runtime, changes in class memory layouts or function signatures could cause immediate Segmentation Faults.
Difficulty: We need to determine a strategy to manage version constraints effectively, ensuring the build-time Arrow version is ABI-compatible with the runtime Arrow version.
2. Runtime Linkage (RPATH Resolution)
Unlike system libraries located in /usr/lib, the target libarrow.so resides deep within the python site-packages/pyarrow directory.
Challenge: Standard linkers will not find this library by default. graphar.so must be configured (likely via RPATH) to dynamically locate libarrow.so relative to its own location at runtime, without forcing users to manually manipulate LD_LIBRARY_PATH.
3. The "Two Arrows" Problem (ODR Violation)
If this linking is not handled correctly (e.g., if GraphAr accidentally links to a static Arrow or a different system Arrow), we risk having two different copies of Arrow code in the process memory.
Consequence: This would violate the One Definition Rule (ODR). Passing objects (like pyarrow.Table) between GraphAr and PyArrow would lead to undefined behavior, data corruption, or crashes.
Objective
We need to design a build strategy that successfully links against the pyarrow-bundled libraries while solving the RPATH and ABI compatibility issues.
Description
As discussed in issue #804, the system-level
libarrow.soprovided in standard manylinux environments (or installed via system package managers) is often incomplete or lacks necessary components for our use case.A more robust solution is to link
graphar.sodirectly against thelibarrow.sobundled within thepyarrowpython package. This ensures we are using a full-featured Arrow library that matches the Python environment.However, adopting this approach introduces several significant build and runtime challenges described below.
The dependency relationship is illustrated as follows:
graph TD A[pyarrow bundled libarrow.so] --> B[pyarrow.whl] A --> C[graphar.so <br> C++ Core] C --> D[graphar.whl <br> Python Binding] B -.-> D style A fill:#f9f,stroke:#333,stroke-width:2px style D fill:#bbf,stroke:#333,stroke-width:2pxKey Challenges
1. ABI Compatibility (The "Segfault" Risk)
C++ ABI (Application Binary Interface) is not guaranteed to be stable across different major versions of Apache Arrow.
graphar.sois built against thelibarrow.sofrompyarrowv14.0.0, but the user updates topyarrowv15.0.0 at runtime, changes in class memory layouts or function signatures could cause immediate Segmentation Faults.2. Runtime Linkage (RPATH Resolution)
Unlike system libraries located in
/usr/lib, the targetlibarrow.soresides deep within the pythonsite-packages/pyarrowdirectory.graphar.somust be configured (likely via RPATH) to dynamically locatelibarrow.sorelative to its own location at runtime, without forcing users to manually manipulateLD_LIBRARY_PATH.3. The "Two Arrows" Problem (ODR Violation)
If this linking is not handled correctly (e.g., if GraphAr accidentally links to a static Arrow or a different system Arrow), we risk having two different copies of Arrow code in the process memory.
pyarrow.Table) between GraphAr and PyArrow would lead to undefined behavior, data corruption, or crashes.Objective
We need to design a build strategy that successfully links against the
pyarrow-bundled libraries while solving the RPATH and ABI compatibility issues.Component(s)
Python, Developer Tools