Non-record: MLX prototyping harness with validated technique stack (val_bpb=1.9588, Mac)#328
Open
kingjulio8238 wants to merge 9 commits intoopenai:mainfrom
Open
Non-record: MLX prototyping harness with validated technique stack (val_bpb=1.9588, Mac)#328kingjulio8238 wants to merge 9 commits intoopenai:mainfrom
kingjulio8238 wants to merge 9 commits intoopenai:mainfrom
Conversation
CLAUDE.md: agent working practices (plan mode, subagent strategy, iteration loop, verification, context efficiency) docs/PLAN.md: full submission strategy — depth recurrence, QAT, test-time compute exploits with phased execution and compute budget Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds NUM_UNIQUE_BLOCKS/NUM_LOOPS hyperparameters for block sharing. When enabled, loops a smaller set of shared blocks multiple times instead of using the U-Net encoder/decoder skip architecture. Baseline mode is fully preserved when disabled (default). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MLP_MULT=3 support (wider MLP, -0.013 BPB) - Int6 per-row quantization (QUANT_BITS=6, saves ~4MB) - FP16 tied embedding passthrough (FP16_EMBED=1) - Sliding window eval with compiled NTK-RoPE (EVAL_STRIDE, EVAL_SEQ_LEN) - Muon decoupled weight decay (MUON_WEIGHT_DECAY) - Overtone spectral embedding init (OVERTONE_INIT) - Phase-transition resid_mix init (PHASE_RESID_MIX) - Extra eval loops support (EVAL_NUM_LOOPS) - Multi-eval mode (EVAL_CONFIGS for testing multiple configs per run) - VAL_MAX_TOKENS for fast directional experiments - Compiled forward for eval (compiled_forward) Validated on Mac: near-zero quant gap (0.0002 BPB) with FP16 embed + Muon WD. All leaderboard openai#1 techniques implemented and tested. Depth recurrence explored and rejected (int6 quant gap too large). 1260 lines, under 1500 limit. All new features default-disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
25+ MLX experiments validating leaderboard techniques on Mac. 14L×416d, 750 steps, 10 shards, full FineWeb val, int8+zlib. Key finding: FP16 embed + Muon WD gives near-zero quant gap (0.001 BPB). Negative results documented: depth recurrence + int6, DWA, eval-time loops. Supporting compute grant application for H100 validation.
4-phase plan based on analysis of top 5 leaderboard submissions. Covers implementation needs, compute budget, and key techniques.
SmearGate, BigramHash, sliding window eval, Muon WD, OrthoInit, Overtone init, phase resid_mix, int5/int6 quant, QAT with STE, zstd-22 compression, FP16 embed passthrough, SWA. 1333 lines (under 1500). All features default-disabled (backward compatible). Ready to run on H100 when compute credits arrive.
- Fix SWA: skip non-float tensors in averaging loop - Fix quantization: guard behind master_process to save memory - Update H100_PLAN: new SOTA 1.1254, add TTT/gradient-guided quant/LN Scale - MLX sweep: WD=0.02 + LR=0.02 is best (-0.005 BPB), init tricks hurt at short training
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Non-record submission — Mac MLX prototyping only, pending H100 validation.
Submitting to document systematic technique exploration and support compute grant application.
val_bpb: 1.9588 (14L×416d, 750 steps, 10 shards, int8+zlib, full FineWeb val, Apple Silicon)
Approach
25+ MLX experiments validating leaderboard techniques, identifying what works and what doesn't.
Implemented & Validated
Key Finding
FP16 embed + Muon WD achieves near-zero quantization gap (0.001 BPB). Post-quant ≈ pre-quant.
Negative Results (documented)
Results
Test plan