Comparing PyTorch and OMEinsum.jl for tensor network contractions.
| Device | Framework | Backend | Min Time (s) | Speedup vs PyTorch |
|---|---|---|---|---|
| GPU | PyTorch | CUDA | 0.107 | baseline |
| GPU | OMEinsum | cuTENSOR | 0.057 | 1.87× faster |
| GPU | OMEinsum | CUBLAS | 0.238 | 2.23× slower |
| CPU | PyTorch | — | 17.2 | — |
| CPU | OMEinsum | MKL | 14.7 | — |
Configuration: Float32, Tensor network with 220 nodes (degree 3), contraction complexity 2^33.2
Requirements:
make init # Initialize both environments
make update # Update dependencies# Run all benchmarks
make run-all
# Run individual benchmarks
make run-pytorch-gpu
make run-pytorch-cpu
make run-julia-gpu # CUBLAS backend
make run-julia-cutensor # cuTENSOR backend
make run-julia-cpu
# Generate reports
make summary # Generate summary.json
make report # Generate PDF report with plotsCustomizable parameters:
make run-julia-gpu DEVICE_ID=1 REPEAT=20Results are saved to results/ directory. Run make help for all options.
Python scripts contributed by @Fanerst. See the original discussion.