Skip to content

Submission: val_bpb=1.2459 (autoresearch-optimized)#343

Open
joeynyc wants to merge 98 commits intoopenai:mainfrom
joeynyc:main
Open

Submission: val_bpb=1.2459 (autoresearch-optimized)#343
joeynyc wants to merge 98 commits intoopenai:mainfrom
joeynyc:main

Conversation

@joeynyc
Copy link

@joeynyc joeynyc commented Mar 21, 2026

Score: 1.2459 val_bpb | Size: 15.9MB | 16,562 steps in 600s on 8xH100 SXM. 97 autonomous experiments on RTX 4080, validated on H100.

joeynyc and others added 30 commits March 19, 2026 00:24
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…5min training

With 5min wallclock and ~700ms/step, warmdown_ms=840s >> 300s budget. LR was
only 36% of base from step 1. Setting to 200 gives full LR for ~53% of training.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joeynyc and others added 30 commits March 19, 2026 07:11
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ch params)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fig)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r training)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tied embeddings to fit under 16MB compressed size limit, then
optimized hyperparameters to recover and improve val_bpb.

Changes from baseline:
- tie_embeddings: 0→1 (saves ~2MB compressed)
- tied_embed_lr: 0.05→0.01
- logit_softcap: 15→12
- grad_clip_norm: 0.25→0.5
- muon_momentum: 0.95→0.9
- muon_momentum_warmup_start: 0.85→0.8
- muon_momentum_warmup_steps: 200→100

Result at 300s: 13.25MB int8+zlib, val_bpb 1.4624

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
val_bpb improved from ~1.45 to 1.3878 at 600s.
Compressed size reduced from 16.26MB to 14.49MB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant