Skip to content

Conversation

@pcmoritz
Copy link
Collaborator

@pcmoritz pcmoritz commented Jan 19, 2026

This brings down the step time of

uv run --with wandb --with tinker==0.3.0 sl_loop.py     base_url=http://localhost:8000     model_name=Qwen/Qwen3-30B-A3B lora_rank=1 max_length=512

with

uv run --extra gpu --extra tinker -m tx.tinker.api     --base-model Qwen/Qwen3-30B-A3B     --backend-config '{"max_lora_adapters": 2, "max_lora_rank": 8, "expert_parallel_size": 8, "train_micro_batch_size": 1, "shard_attention_heads": false}'

from 40s to 20s. I spend some time tuning the tile sizes and also tried different tile sizes / configurations for different settings (e.g. the different projections or low k setting for LoRA), but it only made a very small difference and wouldn't be worth the complexity for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant