Submission: val_bpb=1.2459 (autoresearch-optimized) by joeynyc · Pull Request #343 · openai/parameter-golf

joeynyc · 2026-03-21T14:54:28Z

Score: 1.2459 val_bpb | Size: 15.9MB | 16,562 steps in 600s on 8xH100 SXM. 97 autonomous experiments on RTX 4080, validated on H100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 0d01af3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 1bea7a8.

…5min training With 5min wallclock and ~700ms/step, warmdown_ms=840s >> 300s budget. LR was only 36% of base from step 1. Setting to 200 gives full LR for ~53% of training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 241bebb.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 1283f5f.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 001f9d1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 1592cd0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 565fab7.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 2ea5f41.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 330be2e.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit a3a10e8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit da8ad10.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…fig)" This reverts commit 3ae6b77.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 2e4be26.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit a9e0e4a.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 32bda1d.

…ch params) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n to match params)" This reverts commit bed163a.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 9d5e866.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit b92ee31.

…fig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rent config)" This reverts commit 03cbd19.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit f8498b0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 0487a1b.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 3cd5f39.

…r training) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… ~45s for training)" This reverts commit 95c2565.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit a0a1ea7.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tied embeddings to fit under 16MB compressed size limit, then optimized hyperparameters to recover and improve val_bpb. Changes from baseline: - tie_embeddings: 0→1 (saves ~2MB compressed) - tied_embed_lr: 0.05→0.01 - logit_softcap: 15→12 - grad_clip_norm: 0.25→0.5 - muon_momentum: 0.95→0.9 - muon_momentum_warmup_start: 0.85→0.8 - muon_momentum_warmup_steps: 200→100 Result at 300s: 13.25MB int8+zlib, val_bpb 1.4624 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

val_bpb improved from ~1.45 to 1.3878 at 600s. Compressed size reduced from 16.26MB to 14.49MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joeynyc and others added 30 commits March 19, 2026 00:24

Add autoresearch results tracking file

5f43370

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 1: increase num_layers from 9 to 12

0d01af3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 1: increase num_layers from 9 to 12"

b592948

This reverts commit 0d01af3.

Experiment 2: increase num_layers from 9 to 10

1bea7a8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 2: increase num_layers from 9 to 10"

613a323

This reverts commit 1bea7a8.

Experiment 4: reduce warmdown_iters 200->150

db448ec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 5: reduce warmdown_iters 150->100

241bebb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 5: reduce warmdown_iters 150->100"

fbcf2f4

This reverts commit 241bebb.

Experiment 6: increase matrix_lr 0.04->0.06

1283f5f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 6: increase matrix_lr 0.04->0.06"

20da0a7

This reverts commit 1283f5f.

Experiment 7: reduce warmup_steps 20->10

001f9d1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 7: reduce warmup_steps 20->10"

02c856b

This reverts commit 001f9d1.

Experiment 8: increase tied_embed_lr 0.05->0.1

1592cd0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 8: increase tied_embed_lr 0.05->0.1"

bcffee8

This reverts commit 1592cd0.

Experiment 9: increase scalar_lr 0.04->0.08

565fab7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 9: increase scalar_lr 0.04->0.08"

a332d0d

This reverts commit 565fab7.

Experiment 10: reduce num_kv_heads 4->2 for more aggressive GQA

2c046f6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 11: reduce num_kv_heads 2->1 (MQA)

2ea5f41

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 11: reduce num_kv_heads 2->1 (MQA)"

09f8e76

This reverts commit 2ea5f41.

Experiment 12: reduce muon_momentum_warmup_steps 500->200

d22a4d8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 13: reduce muon_momentum_warmup_steps 200->100

330be2e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 13: reduce muon_momentum_warmup_steps 200->100"

fdfcd3b

This reverts commit 330be2e.

Experiment 14: increase model_dim 512->576

a3a10e8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 14: increase model_dim 512->576"

7070c05

This reverts commit a3a10e8.

Experiment 15: reduce train_seq_len 1024->512

da8ad10

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 15: reduce train_seq_len 1024->512"

8eedac5

This reverts commit da8ad10.

Experiment 16: enable grad_clip_norm=1.0

af96bd6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 17: grad_clip_norm 1.0->0.5

f13dc4b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Experiment 18: grad_clip_norm 0.5->0.25

af59dc9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joeynyc and others added 30 commits March 19, 2026 07:11

Experiment 42: num_heads back to 8 (re-check with current config)

3ae6b77

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 42: num_heads back to 8 (re-check with current con…

331e199

…fig)" This reverts commit 3ae6b77.

Experiment 43: replace relu^2 MLP activation with GELU

2e4be26

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 43: replace relu^2 MLP activation with GELU"

bdd09cb

This reverts commit 2e4be26.

Experiment 44: head_lr 0.008->0.005

a9e0e4a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 44: head_lr 0.008->0.005"

967420c

This reverts commit a9e0e4a.

Experiment 45: head_lr 0.008->0.012

32bda1d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 45: head_lr 0.008->0.012"

b50cc73

This reverts commit 32bda1d.

Experiment 46: replace relu^2 with SwiGLU MLP (adjusted hidden to mat…

bed163a

…ch params) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 46: replace relu^2 with SwiGLU MLP (adjusted hidde…

ffedbfe

…n to match params)" This reverts commit bed163a.

Experiment 47: grad_clip_norm 0.25->0.35

9d5e866

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 47: grad_clip_norm 0.25->0.35"

7c31dba

This reverts commit 9d5e866.

Experiment 48: embed_lr 0.6->0.45

b92ee31

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 48: embed_lr 0.6->0.45"

e09cc28

This reverts commit b92ee31.

Experiment 49: num_kv_heads 2->4 (full MHA, re-check with current con…

03cbd19

…fig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 49: num_kv_heads 2->4 (full MHA, re-check with cur…

e4302ab

…rent config)" This reverts commit 03cbd19.

Experiment 50: scalar_lr 0.04->0.06

f8498b0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 50: scalar_lr 0.04->0.06"

bd88ae6

This reverts commit f8498b0.

Experiment 51: model_dim 512->448 (faster training)

0487a1b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 51: model_dim 512->448 (faster training)"

b1d51f9

This reverts commit 0487a1b.

Experiment 52: matrix_lr 0.06->0.08 + grad_clip 0.25->0.15

3cd5f39

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 52: matrix_lr 0.06->0.08 + grad_clip 0.25->0.15"

3315f5b

This reverts commit 3cd5f39.

Experiment 53: val_loss_every 1000->0 (skip step-0 eval, save ~45s fo…

95c2565

…r training) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 53: val_loss_every 1000->0 (skip step-0 eval, save…

532b6f4

… ~45s for training)" This reverts commit 95c2565.

Experiment 54: beta1 0.9->0.85

a0a1ea7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Experiment 54: beta1 0.9->0.85"

8ae2331

This reverts commit a0a1ea7.

Experiment 55: cosine LR schedule instead of linear warmdown

d6aadc8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Log experiment results: tied embeddings + hyperparameter tuning

b355c9c

val_bpb improved from ~1.45 to 1.3878 at 600s. Compressed size reduced from 16.26MB to 14.49MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Submission: val_bpb=1.2459, autoresearch-optimized, 15.9MB

5898dfd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission: val_bpb=1.2459 (autoresearch-optimized)#343

Submission: val_bpb=1.2459 (autoresearch-optimized)#343
joeynyc wants to merge 98 commits intoopenai:mainfrom
joeynyc:main

joeynyc commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joeynyc commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant