pip install mamba-ssm on aarch64 + PyTorch 2.10+ fails for two reasons:
- Wrong architecture. The wheel ships
selective_scan_cuda.cpython-312-x86_64-linux-gnu.so — can't load on aarch64.
- ABI mismatch. Even with
MAMBA_FORCE_BUILD=TRUE, the PyPI release (v2.3.1) compiles against c10::cuda::CUDAStream::query(), a symbol removed in PyTorch 2.10+. Same issue affects causal-conv1d from PyPI.
Working fix: Install from GitHub main branch which has the ABI fixes:
export MAMBA_FORCE_BUILD=TRUE
export CAUSAL_CONV1D_FORCE_BUILD=TRUE
pip install --no-build-isolation "causal-conv1d @ git+https://github.com/Dao-AILab/causal-conv1d.git"
pip install --no-build-isolation "mamba-ssm @ git+https://github.com/state-spaces/mamba.git"
Tested on: DGX Spark GB10 (aarch64, SM121), nvcr.io/nvidia/pytorch:25.11-py3, PyTorch 2.10, CUDA 13.0. Successfully loads Nemotron-Nano-12B-v2-VL-BF16 (13.2B params) on GPU.
Dockerfile with working build: https://github.com/Sggin1/spark-ai-containers/tree/main/mamba-dev
Note: Tested on one configuration only (DGX Spark, pytorch:25.11-py3). I'm a hobbyist sharing what worked for me — there may be better approaches I'm not aware of.
pip install mamba-ssmon aarch64 + PyTorch 2.10+ fails for two reasons:selective_scan_cuda.cpython-312-x86_64-linux-gnu.so— can't load on aarch64.MAMBA_FORCE_BUILD=TRUE, the PyPI release (v2.3.1) compiles againstc10::cuda::CUDAStream::query(), a symbol removed in PyTorch 2.10+. Same issue affectscausal-conv1dfrom PyPI.Working fix: Install from GitHub main branch which has the ABI fixes:
Tested on: DGX Spark GB10 (aarch64, SM121),
nvcr.io/nvidia/pytorch:25.11-py3, PyTorch 2.10, CUDA 13.0. Successfully loads Nemotron-Nano-12B-v2-VL-BF16 (13.2B params) on GPU.Dockerfile with working build: https://github.com/Sggin1/spark-ai-containers/tree/main/mamba-dev
Note: Tested on one configuration only (DGX Spark, pytorch:25.11-py3). I'm a hobbyist sharing what worked for me — there may be better approaches I'm not aware of.