Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM) by shivnarainms22 · Pull Request #366 · openai/parameter-golf

shivnarainms22 · 2026-03-21T20:14:21Z

Summary

Non-record submission combining two techniques on top of thwu1's #1 record base (1.1428 bpb):

Backout Connection (inspired by PR Record: 11L Backout + Int6 + SWA (val_bpb: 1.1364) #339): Learned residual subtraction at the U-Net midpoint (layer 5). Subtracts lambda * h_mid from the final representation before RMSNorm. Adds exactly 1 scalar parameter at zero computational cost.
Test-Time Training (inspired by PR Record: 11L XSA+EMA+TTT, sliding val_bpb=1.1254 (3-seed mean 1.1256) #338): 3 epochs of full-weight SGD (lr=0.002, momentum=0.9) on validation tokens after
quantization roundtrip. First 2 blocks frozen. Adapts the quantized model to recover from quantization degradation.

Results

Hardware	Steps	val_bpb	Artifact Size
1xH100 (RunPod)	869	1.4463	15.5MB
1xA100 (Northeastern HPC)	423	1.6760	15.5MB
8xH100 SXM	Pending	Pending	Pending

Scores reflect undertraining on 1xGPU (~869 steps vs ~7000+ on 8xH100). All components verified working end-to-end: training,
SWA, mixed int5/int6 quantization, zstd-22 compression, TTT, and sliding window eval.

Architecture

10 layers, 512 dim, GQA (8/4 heads), 3x MLP (relu^2)
SmearGate + BigramHash(10240, dim=128)
U-Net skip connections, tied embeddings
Mixed int5 (MLP) / int6 (attention) quantization + zstd-22
3% magnitude pruning, SWA(start_frac=0.4)
Backout connection at layer 5 (lambda init=0.2)
TTT: 3 epochs SGD post-quantization
Sliding window eval stride=64

Note

8xH100 SXM results pending compute availability. Will update this PR with full results once obtained.

shivnarainms22 added 2 commits March 21, 2026 13:11

Non-record: 10L Int5-MLP + TTT + Backout Connection (1xH100 verified)

c4c3bfe

Update: 8xH100 result val_bpb=1.1574, add EMA replacing SWA

f9c74fb

shivnarainms22 changed the title ~~Non-record: 10L Int5-MLP + TTT + Backout Connection~~ Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM) Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM)#366

Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM)#366
shivnarainms22 wants to merge 2 commits intoopenai:mainfrom
shivnarainms22:submission/ttt-backout-nonrecord

shivnarainms22 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shivnarainms22 commented Mar 21, 2026

Summary

Results

Architecture

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant