@mszpindler asked the following questions that I transferred this to a new issue as we already merged #77 :
|
srun --cpu-bind=v,mask_cpu=$CPU_BIND_MASKS singularity run -B ../resources/ai-guide-env.sqsh:/user-software:image-src=/ \ |
|
$SIF bash -c 'export RANK=$SLURM_PROCID; export LOCAL_RANK=$SLURM_LOCALID; python ds_visiontransformer.py --deepspeed --deepspeed_config ds_config.json' |
Here $SLURM_PROCID and $SLURM_LOCALID also need to be backslashed, aren't they?
|
srun --cpu-bind=v,mask_cpu=$CPU_BIND_MASKS singularity run -B ../resources/ai-guide-env.sqsh:/user-software:image-src=/ \ |
|
$SIF bash -c 'export RANK=$SLURM_PROCID; export LOCAL_RANK=$SLURM_LOCALID; python ds_visiontransformer.py --deepspeed --deepspeed_config ds_config.json' |
Shouldn't there be export RANK=\$SLURM_PROCID; export LOCAL_RANK=\$SLURM_LOCALID;
@mszpindler asked the following questions that I transferred this to a new issue as we already merged #77 :
LUMI-AI-Guide/5-multi-gpu-and-node/run_ds_srun_4.sh
Lines 39 to 40 in f60c8e2
LUMI-AI-Guide/5-multi-gpu-and-node/run_ds_srun.sh
Lines 39 to 40 in f60c8e2
export RANK=\$SLURM_PROCID; export LOCAL_RANK=\$SLURM_LOCALID;