Problem
Checkpoint saving currently writes matformer_tier as the effective training tier (effective_tier), not the actual slicing level of the saved weights. When training a full-width checkpoint with --matformer-tier > 0 (runtime slicing), the saved config will set matformer_tier to that tier even though weights are full-width. This makes future auto-detection treat the checkpoint as sliced and can disable runtime slicing or cause incorrect tier handling.
Ref:
shared/client/src/state/cooldown.rs (writes matformer_tier from effective_tier)
Expected
matformer_tier should represent the actual slicing of the checkpoint weights. If weights are full-width, this should be 0. If weights are sliced, it should reflect the slice ratio.
Possible Approach
- When writing config.json, compute
actual_tier from matformer_base_intermediate_size and intermediate_size (if ratio is power-of-two), and set matformer_tier to that value.
- Always store
matformer_base_intermediate_size (if not present) to keep detection unambiguous.
- Fall back to 0 when ratios don’t match.
Acceptance Criteria
- Full-width checkpoints always save
matformer_tier: 0.
- Sliced checkpoints save the correct tier and base size.
- Auto-detection works for checkpoints saved during training.
Problem
Checkpoint saving currently writes
matformer_tieras the effective training tier (effective_tier), not the actual slicing level of the saved weights. When training a full-width checkpoint with--matformer-tier > 0(runtime slicing), the saved config will setmatformer_tierto that tier even though weights are full-width. This makes future auto-detection treat the checkpoint as sliced and can disable runtime slicing or cause incorrect tier handling.Ref:
shared/client/src/state/cooldown.rs(writesmatformer_tierfromeffective_tier)Expected
matformer_tiershould represent the actual slicing of the checkpoint weights. If weights are full-width, this should be 0. If weights are sliced, it should reflect the slice ratio.Possible Approach
actual_tierfrommatformer_base_intermediate_sizeandintermediate_size(if ratio is power-of-two), and setmatformer_tierto that value.matformer_base_intermediate_size(if not present) to keep detection unambiguous.Acceptance Criteria
matformer_tier: 0.