Problem
Description
The default configuration in qwen35_vl_122b_a10b_sft_config has issues with parallelism strategy calculation and documentation inconsistency.
Configuration Analysis
From the code:
def qwen35_vl_122b_a10b_sft_config(hf_path: str = "Qwen/Qwen3.5-122B-A10B") -> ConfigContainer:
"""Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).
Default configuration: 4 nodes, 32 GPUs
- TP=2, PP=6, EP=8
- LR=2e-5 (full SFT)
- Sequence length: 4096
Args:
hf_path: HuggingFace model ID or local path to model directory.
"""
cfg = _sft_common_vlm()
_qwen35_vl_apply_common(cfg, hf_path, tp=2, pp=6, max_lr=2e-5, min_lr=2e-6, gbs=36)
_qwen35_vl_apply_moe(cfg, ep=8)
_qwen35_vl_enable_recompute(cfg)
return cfg
but pp=6 ,ep=8 can not devide by 32 gpus
Minimal repro
Expected behavior
this config need 48 gpus or more.
Affected area
area:misc
Regression?
No
Environment
No response
Logs
Problem
Description
The default configuration in
qwen35_vl_122b_a10b_sft_confighas issues with parallelism strategy calculation and documentation inconsistency.Configuration Analysis
From the code:
but pp=6 ,ep=8 can not devide by 32 gpus
Minimal repro
Expected behavior
this config need 48 gpus or more.
Affected area
area:misc
Regression?
No
Environment
No response
Logs