[QDP] The Scaling Test (Latency vs. Qubits) #778

400Ping · 2026-01-01T17:50:02Z

Purpose of PR

Measure loading time for state sizes ranging from N=10 to N=28 qubits.
Hypothesis: Circuit-based methods will degrade exponentially (O(N) gates) or linearly (O(N) depth), while QDP will remain effectively constant until PCIe bandwidth saturation.

Related Issues or PRs

Closes #740

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping · 2026-01-02T01:09:37Z

cc @rich7420 @guan404ming @ryankert01

Signed-off-by: 400Ping <fourhundredping@gmail.com>

rich7420 · 2026-01-04T10:07:30Z

my results:

uv run python benchmark/benchmark_latency.py --qubits 12 --batches 20 --batch-size 8 --prefetch 4
Generating 160 samples of 12 qubits...
  Batch size   : 8
  Vector length: 4096
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 160 samples
  PennyLane/Qiskit format: 5.00 MB
  Mahout format: 5.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 12 Qubits, 160 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0641 s (0.401 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 10.5576 s (65.985 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0329 s (0.205 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.1763 s (1.102 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 160, Qubits: 12
======================================================================
Qiskit Statevector      0.205 ms/vector
PennyLane               0.401 ms/vector
Mahout                  1.102 ms/vector
Qiskit Initialize      65.985 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.36x
Speedup vs Qiskit Init:          59.89x
Speedup vs Qiskit Statevec:       0.19x

uv run python benchmark/benchmark_latency.py --qubits 18 --batches 20 --batch-size 32 --prefetch 4
Generating 640 samples of 18 qubits...
  Batch size   : 32
  Vector length: 262144
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 640 samples
  PennyLane/Qiskit format: 1280.00 MB
  Mahout format: 1280.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 18 Qubits, 640 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 3.4523 s (5.394 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 227.0904 s (354.829 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 3.7708 s (5.892 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 7.3243 s (11.444 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 640, Qubits: 18
======================================================================
PennyLane               5.394 ms/vector
Qiskit Statevector      5.892 ms/vector
Mahout                 11.444 ms/vector
Qiskit Initialize     354.829 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.47x
Speedup vs Qiskit Init:          31.01x
Speedup vs Qiskit Statevec:       0.51x

rich7420

@400Ping thanks for the patch

qdp/qdp-python/benchmark/benchmark_latency.md

qdp/qdp-python/benchmark/benchmark_latency.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping · 2026-01-05T01:29:43Z

cc @guan404ming PTAL

guan404ming

Looks nice!

guan404ming · 2026-01-05T04:31:03Z

cc @rich7420

ryankert01 · 2026-01-05T07:37:08Z

on colab, any thought of why we are slower then pennylane?

--qubits 12 --batches 20 --batch-size 8 --prefetch 4

Generating 160 samples of 12 qubits...
  Batch size   : 8
  Vector length: 4096
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 160 samples
  PennyLane/Qiskit format: 5.00 MB
  Mahout format: 5.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 12 Qubits, 160 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0689 s (0.431 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 19.0670 s (119.169 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0230 s (0.144 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.2116 s (1.322 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 160, Qubits: 12
======================================================================
Qiskit Statevector      0.144 ms/vector
PennyLane               0.431 ms/vector
Mahout                  1.322 ms/vector
Qiskit Initialize     119.169 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.33x
Speedup vs Qiskit Init:          90.13x
Speedup vs Qiskit Statevec:       0.11x

The Scaling Test (Latency vs. Qubits)

4022aeb

Signed-off-by: 400Ping <fourhundredping@gmail.com>

fix pre-commit

2813d40

Signed-off-by: 400Ping <fourhundredping@gmail.com>

rich7420 reviewed Jan 4, 2026

View reviewed changes

qdp/qdp-python/benchmark/benchmark_latency.md Show resolved Hide resolved

qdp/qdp-python/benchmark/benchmark_latency.py Outdated Show resolved Hide resolved

guan404ming mentioned this pull request Jan 4, 2026

[QDP] Initiate project QDP #786

Draft

14 tasks

[Chore] make initialization clearer & clearfy doc

1735c04

Signed-off-by: 400Ping <fourhundredping@gmail.com>

guan404ming approved these changes Jan 5, 2026

View reviewed changes

ryankert01 approved these changes Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QDP] The Scaling Test (Latency vs. Qubits) #778

[QDP] The Scaling Test (Latency vs. Qubits) #778

400Ping commented Jan 1, 2026

Uh oh!

400Ping commented Jan 2, 2026

Uh oh!

rich7420 commented Jan 4, 2026 •

edited

Loading

Uh oh!

rich7420 left a comment

Uh oh!

Uh oh!

Uh oh!

400Ping commented Jan 5, 2026

Uh oh!

guan404ming left a comment

Uh oh!

guan404ming commented Jan 5, 2026

Uh oh!

ryankert01 commented Jan 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[QDP] The Scaling Test (Latency vs. Qubits) #778

Are you sure you want to change the base?

[QDP] The Scaling Test (Latency vs. Qubits) #778

Conversation

400Ping commented Jan 1, 2026

Purpose of PR

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

400Ping commented Jan 2, 2026

Uh oh!

rich7420 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rich7420 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

400Ping commented Jan 5, 2026

Uh oh!

guan404ming left a comment

Choose a reason for hiding this comment

Uh oh!

guan404ming commented Jan 5, 2026

Uh oh!

ryankert01 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rich7420 commented Jan 4, 2026 •

edited

Loading

ryankert01 commented Jan 5, 2026 •

edited

Loading