Skip to content

Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM)#366

Open
shivnarainms22 wants to merge 2 commits intoopenai:mainfrom
shivnarainms22:submission/ttt-backout-nonrecord
Open

Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM)#366
shivnarainms22 wants to merge 2 commits intoopenai:mainfrom
shivnarainms22:submission/ttt-backout-nonrecord

Conversation

@shivnarainms22
Copy link

Summary

Non-record submission combining two techniques on top of thwu1's #1 record base (1.1428 bpb):

Results

Hardware Steps val_bpb Artifact Size
1xH100 (RunPod) 869 1.4463 15.5MB
1xA100 (Northeastern HPC) 423 1.6760 15.5MB
8xH100 SXM Pending Pending Pending

Scores reflect undertraining on 1xGPU (~869 steps vs ~7000+ on 8xH100). All components verified working end-to-end: training,
SWA, mixed int5/int6 quantization, zstd-22 compression, TTT, and sliding window eval.

Architecture

  • 10 layers, 512 dim, GQA (8/4 heads), 3x MLP (relu^2)
  • SmearGate + BigramHash(10240, dim=128)
  • U-Net skip connections, tied embeddings
  • Mixed int5 (MLP) / int6 (attention) quantization + zstd-22
  • 3% magnitude pruning, SWA(start_frac=0.4)
  • Backout connection at layer 5 (lambda init=0.2)
  • TTT: 3 epochs SGD post-quantization
  • Sliding window eval stride=64

Note

8xH100 SXM results pending compute availability. Will update this PR with full results once obtained.

@shivnarainms22 shivnarainms22 changed the title Non-record: 10L Int5-MLP + TTT + Backout Connection Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM) Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant