Motivation
GPU backends (FAISS GPU, cuVS, Metal) need contiguous float32 buffers, but zvec stores vectors in IndexProvider segments. Currently there's no bridge to stream vectors from storage into GPU-ready memory.
Proposed approach
A new C++ class GpuBufferLoader (src/ailego/gpu/gpu_buffer_loader.h) that:
- Iterates over an
IndexProvider via its existing Iterator API
- Deserializes
ForwardData into float32 vectors
- Fills a contiguous host buffer suitable for GPU upload
- Reports progress (vector count, bytes loaded)
IndexProvider → Iterator → ForwardData → float32 buffer → GPU
This stays within zvec's existing storage architecture — no standalone databases or new storage engines.
Also includes a docs/METAL_CPP.md documenting the Metal kernel architecture and GPU integration points.
Questions for maintainers
- Is
IndexProvider::Iterator the right abstraction for bulk vector extraction, or is there a better path?
- Should the buffer loader live under
src/ailego/gpu/ or somewhere else?
- Any concerns about memory usage for large collections? (Current approach loads everything into host RAM before GPU upload — we could add batched streaming.)
Draft implementation: #175
Motivation
GPU backends (FAISS GPU, cuVS, Metal) need contiguous float32 buffers, but zvec stores vectors in
IndexProvidersegments. Currently there's no bridge to stream vectors from storage into GPU-ready memory.Proposed approach
A new C++ class
GpuBufferLoader(src/ailego/gpu/gpu_buffer_loader.h) that:IndexProvidervia its existingIteratorAPIForwardDatainto float32 vectorsThis stays within zvec's existing storage architecture — no standalone databases or new storage engines.
Also includes a
docs/METAL_CPP.mddocumenting the Metal kernel architecture and GPU integration points.Questions for maintainers
IndexProvider::Iteratorthe right abstraction for bulk vector extraction, or is there a better path?src/ailego/gpu/or somewhere else?Draft implementation: #175