Skip to content

[bug] : Default Config Parallelism Strategy Issue for Qwen3.5-VL 122B-A10B SFT #3747

@SophusDavid

Description

@SophusDavid

Problem

Description

The default configuration in qwen35_vl_122b_a10b_sft_config has issues with parallelism strategy calculation and documentation inconsistency.

Configuration Analysis

From the code:

def qwen35_vl_122b_a10b_sft_config(hf_path: str = "Qwen/Qwen3.5-122B-A10B") -> ConfigContainer:
    """Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).

    Default configuration: 4 nodes, 32 GPUs
    - TP=2, PP=6, EP=8
    - LR=2e-5 (full SFT)
    - Sequence length: 4096

    Args:
        hf_path: HuggingFace model ID or local path to model directory.
    """
    cfg = _sft_common_vlm()
    _qwen35_vl_apply_common(cfg, hf_path, tp=2, pp=6, max_lr=2e-5, min_lr=2e-6, gbs=36)
    _qwen35_vl_apply_moe(cfg, ep=8)
    _qwen35_vl_enable_recompute(cfg)
    return cfg

but pp=6 ,ep=8 can not devide by 32 gpus

Minimal repro

nope

Expected behavior

this config need 48 gpus or more.

Affected area

area:misc

Regression?

No

Environment

No response

Logs

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions