[QDP] Numpy IO support #777

ryankert01 · 2025-12-31T16:44:51Z

Purpose of PR

ran some benchmark + test(maybe) on colab.
https://colab.research.google.com/drive/1NuAyAFsko3uD8MMBTDQpob4zSxg_GeiC?usp=sharing

Related Issues or PRs

Closes #722

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

rich7420 · 2026-01-03T14:39:13Z

@ryankert01 thanks for the patch

rich7420 · 2026-01-03T14:40:14Z

Should we add a file size check like other format to prevent OOM?

rich7420 · 2026-01-03T14:42:25Z

qdp/qdp-core/src/readers/numpy.rs

+        let path = path.as_ref();
+
+        // Verify file exists
+        if !path.exists() {


I think we could add this path.exists() in other format.

rich7420 · 2026-01-03T14:47:57Z

qdp/qdp-core/src/readers/numpy.rs

+
+        // Flatten to Vec<f64>
+        // Handle both C-contiguous (row-major) and Fortran-contiguous (column-major)
+        let data = if array.is_standard_layout() {


I think this one it's great!

rich7420 · 2026-01-03T14:49:03Z

overall LGTM

400Ping · 2026-01-04T09:31:57Z

Should we add a file size check like other format to prevent OOM?

I agree with this one.

400Ping · 2026-01-04T09:49:29Z

I have thought of 2 more memory-friendly alternatives to the current read_npy -> Array2 -> flatten Vec flow:

streaming/iterator reading: we parse the .npy header (dtype/shape/order) and then iterate the flat data from the file in small chunks (so the file doesn’t need to fit in RAM).
memory-mapping (mmap): we map the .npy file into memory (OS loads pages on demand), parse the header to locate the data region, and avoid an extra “read + flatten copy” peak while also enabling easy slicing/random access.

We can do this in a follow up if needed.

400Ping

Overall LGTM

guan404ming · 2026-01-04T17:41:47Z

Should we add a file size check like other format to prevent OOM?

This one is a nice suggestion!

ryankert01 added 2 commits December 31, 2025 16:05

feat: add numpy as reader

9580062

add benchmark

7bc7ee3

ryankert01 changed the base branch from main to dev-qdp December 31, 2025 16:45

ryankert01 mentioned this pull request Dec 31, 2025

MAHOUT-725: [QDP] PyTorch Tensor Detection and CPU Path #763

Merged

14 tasks

rich7420 reviewed Jan 3, 2026

View reviewed changes

400Ping approved these changes Jan 4, 2026

View reviewed changes

guan404ming approved these changes Jan 4, 2026

View reviewed changes

guan404ming merged commit 57f90e6 into apache:dev-qdp Jan 4, 2026
4 checks passed

This was referenced Jan 4, 2026

[QDP][followup] Numpy file size check to prevent OOM #787

Open

[QDP][followup] add path.exists() in other format. #788

Closed

[QDP] Numpy input potential speed & mem improv #789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QDP] Numpy IO support #777

[QDP] Numpy IO support #777

Uh oh!

ryankert01 commented Dec 31, 2025 •

edited

Loading

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

rich7420 Jan 3, 2026

Uh oh!

rich7420 Jan 3, 2026

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

400Ping commented Jan 4, 2026

Uh oh!

400Ping commented Jan 4, 2026 •

edited

Loading

Uh oh!

400Ping left a comment

Uh oh!

guan404ming commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[QDP] Numpy IO support #777

[QDP] Numpy IO support #777

Uh oh!

Conversation

ryankert01 commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of PR

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

rich7420 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

rich7420 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

rich7420 commented Jan 3, 2026

Uh oh!

400Ping commented Jan 4, 2026

Uh oh!

400Ping commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

400Ping left a comment

Choose a reason for hiding this comment

Uh oh!

guan404ming commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ryankert01 commented Dec 31, 2025 •

edited

Loading

400Ping commented Jan 4, 2026 •

edited

Loading