Submission: val_bpb=1.2459 (autoresearch-optimized)#343
Open
joeynyc wants to merge 98 commits intoopenai:mainfrom
Open
Submission: val_bpb=1.2459 (autoresearch-optimized)#343joeynyc wants to merge 98 commits intoopenai:mainfrom
joeynyc wants to merge 98 commits intoopenai:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 0d01af3.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 1bea7a8.
…5min training With 5min wallclock and ~700ms/step, warmdown_ms=840s >> 300s budget. LR was only 36% of base from step 1. Setting to 200 gives full LR for ~53% of training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 241bebb.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 1283f5f.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 001f9d1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 1592cd0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 565fab7.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 2ea5f41.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 330be2e.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit a3a10e8.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit da8ad10.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fig)" This reverts commit 3ae6b77.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 2e4be26.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit a9e0e4a.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 32bda1d.
…ch params) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n to match params)" This reverts commit bed163a.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 9d5e866.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit b92ee31.
…fig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rent config)" This reverts commit 03cbd19.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit f8498b0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 0487a1b.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 3cd5f39.
…r training) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… ~45s for training)" This reverts commit 95c2565.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit a0a1ea7.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tied embeddings to fit under 16MB compressed size limit, then optimized hyperparameters to recover and improve val_bpb. Changes from baseline: - tie_embeddings: 0→1 (saves ~2MB compressed) - tied_embed_lr: 0.05→0.01 - logit_softcap: 15→12 - grad_clip_norm: 0.25→0.5 - muon_momentum: 0.95→0.9 - muon_momentum_warmup_start: 0.85→0.8 - muon_momentum_warmup_steps: 200→100 Result at 300s: 13.25MB int8+zlib, val_bpb 1.4624 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
val_bpb improved from ~1.45 to 1.3878 at 600s. Compressed size reduced from 16.26MB to 14.49MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Score: 1.2459 val_bpb | Size: 15.9MB | 16,562 steps in 600s on 8xH100 SXM. 97 autonomous experiments on RTX 4080, validated on H100.