Non-record: Autoresearch Heads4 + Step-based LR + Sliding Window (1xH100) by aryanbhosale · Pull Request #344 · openai/parameter-golf

aryanbhosale · 2026-03-21T14:57:21Z

Summary

Non-record submission exploring automated architecture search for Parameter Golf.

Built on the current SOTA (10L, int5/int6, BigramHash, SmearGate, SWA) with 75+ automated experiments across Mac MLX and 1xH100 CUDA using an autoresearch loop inspired by Karpathy's autoresearch methodology.

Key Findings

Technique	Relative BPB Change	Notes
NUM_HEADS=4, head_dim=128	-0.095	Fewer, larger heads
Step-based LR schedule	-0.483	vs wallclock-based warmdown
BigramHash(16384)	-0.025	vs 10240
MATRIX_LR=0.03	-0.003	vs 0.02

Results (1xH100, 800 steps)

Pre-quant val_bpb: 1.2913
Post-quant val_bpb (sliding window stride=256): 1.2756
Artifact size: 17.4MB (over 16MB — needs int4/int5 MLP compression)

Known Issues

Artifact exceeds 16MB due to head_dim=128 increasing param count. Compression optimization (int4/int5 MLP weights) needed to fit budget.
Tested on 1xH100 only. Requesting compute grant for 8xH100 validation.

Negative Results (also valuable)

LoRA test-time training: worse (-0.09 BPB)
Block-wise weight sharing: worse + 2x slower
SwiGLU activation: worse than relu^2
MQA (NUM_KV_HEADS=1): worse quality
seq_len=4096: too slow per step

Built on SOTA (10L, int5/int6, BigramHash, SmearGate, SWA) with 75+ automated experiments across Mac MLX and 1xH100 CUDA. Key findings: - NUM_HEADS=4 with head_dim=128: -0.095 BPB relative improvement - Step-based LR schedule: -0.483 BPB vs wallclock-based - BigramHash(16384): -0.025 BPB vs 10240 - MATRIX_LR=0.03: -0.003 BPB Tested on 1xH100 (800 steps, 600s). Post-quant val_bpb: 1.2756 with sliding window eval stride=256. Known issue: artifact is 17.4MB (over 16MB) due to head_dim=128 increasing params. Needs int4/int5 MLP compression to fit budget.

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Autoresearch Heads4 + Step-based LR + Sliding Window (1xH100)#344

Non-record: Autoresearch Heads4 + Step-based LR + Sliding Window (1xH100)#344
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/autoresearch-heads4

aryanbhosale commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryanbhosale commented Mar 21, 2026

Summary

Key Findings

Results (1xH100, 800 steps)

Known Issues

Negative Results (also valuable)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant