Hello,
I'm trying to run CPM.cu on the Nvidia Jetson AGX Orin. I've referenced this Nvidia forum post for guidance: https://forums.developer.nvidia.com/t/running-lightweight-cpm-cu-on-the-nvidia-jetson-agx-orin-64gb-developer-kit/336681/1
Is the following command correct for running a vanilla model inference without quantization or speculative decoding?
python -m cpmcu.server --test-minicpm4 --no_apply_quant --no-apply-eagle --no-apply-eagle-quant
Thanks!
Hello,
I'm trying to run CPM.cu on the Nvidia Jetson AGX Orin. I've referenced this Nvidia forum post for guidance: https://forums.developer.nvidia.com/t/running-lightweight-cpm-cu-on-the-nvidia-jetson-agx-orin-64gb-developer-kit/336681/1
Is the following command correct for running a vanilla model inference without quantization or speculative decoding?
python -m cpmcu.server --test-minicpm4 --no_apply_quant --no-apply-eagle --no-apply-eagle-quantThanks!