Skip to content

Conversation

@400Ping
Copy link

@400Ping 400Ping commented Jan 1, 2026

Purpose of PR

  • Measure loading time for state sizes ranging from N=10 to N=28 qubits.
  • Hypothesis: Circuit-based methods will degrade exponentially (O(N) gates) or linearly (O(N) depth), while QDP will remain effectively constant until PCIe bandwidth saturation.

Related Issues or PRs

Closes #740

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping
Copy link
Author

400Ping commented Jan 2, 2026

Signed-off-by: 400Ping <fourhundredping@gmail.com>
@rich7420
Copy link
Contributor

rich7420 commented Jan 4, 2026

my results:

uv run python benchmark/benchmark_latency.py --qubits 12 --batches 20 --batch-size 8 --prefetch 4
Generating 160 samples of 12 qubits...
  Batch size   : 8
  Vector length: 4096
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 160 samples
  PennyLane/Qiskit format: 5.00 MB
  Mahout format: 5.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 12 Qubits, 160 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0641 s (0.401 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 10.5576 s (65.985 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0329 s (0.205 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.1763 s (1.102 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 160, Qubits: 12
======================================================================
Qiskit Statevector      0.205 ms/vector
PennyLane               0.401 ms/vector
Mahout                  1.102 ms/vector
Qiskit Initialize      65.985 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.36x
Speedup vs Qiskit Init:          59.89x
Speedup vs Qiskit Statevec:       0.19x

uv run python benchmark/benchmark_latency.py --qubits 18 --batches 20 --batch-size 32 --prefetch 4
Generating 640 samples of 18 qubits...
  Batch size   : 32
  Vector length: 262144
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 640 samples
  PennyLane/Qiskit format: 1280.00 MB
  Mahout format: 1280.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 18 Qubits, 640 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 3.4523 s (5.394 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 227.0904 s (354.829 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 3.7708 s (5.892 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 7.3243 s (11.444 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 640, Qubits: 18
======================================================================
PennyLane               5.394 ms/vector
Qiskit Statevector      5.892 ms/vector
Mahout                 11.444 ms/vector
Qiskit Initialize     354.829 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.47x
Speedup vs Qiskit Init:          31.01x
Speedup vs Qiskit Statevec:       0.51x

Copy link
Contributor

@rich7420 rich7420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@400Ping thanks for the patch

@guan404ming guan404ming mentioned this pull request Jan 4, 2026
14 tasks
Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping
Copy link
Author

400Ping commented Jan 5, 2026

cc @guan404ming PTAL

Copy link
Member

@guan404ming guan404ming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

@guan404ming
Copy link
Member

cc @rich7420

@ryankert01
Copy link
Contributor

ryankert01 commented Jan 5, 2026

on colab, any thought of why we are slower then pennylane?

--qubits 12 --batches 20 --batch-size 8 --prefetch 4

Generating 160 samples of 12 qubits...
  Batch size   : 8
  Vector length: 4096
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 160 samples
  PennyLane/Qiskit format: 5.00 MB
  Mahout format: 5.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 12 Qubits, 160 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0689 s (0.431 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 19.0670 s (119.169 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0230 s (0.144 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.2116 s (1.322 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 160, Qubits: 12
======================================================================
Qiskit Statevector      0.144 ms/vector
PennyLane               0.431 ms/vector
Mahout                  1.322 ms/vector
Qiskit Initialize     119.169 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            0.33x
Speedup vs Qiskit Init:          90.13x
Speedup vs Qiskit Statevec:       0.11x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants