Skip to content

Add Looped Transformer Design non-record submission (non tuned)#325

Open
Aum08Desai wants to merge 1 commit intoopenai:mainfrom
Aum08Desai:looped-transformer-design-nonrecord
Open

Add Looped Transformer Design non-record submission (non tuned)#325
Aum08Desai wants to merge 1 commit intoopenai:mainfrom
Aum08Desai:looped-transformer-design-nonrecord

Conversation

@Aum08Desai
Copy link

Adds a notable non-record 10-minute submission under records/track_non_record_16mb/.

Summary:

  • PR315-derived looped transformer with a shared recurrent core
  • Partial RoPE, LN scaling, late QAT, XSA4, and bigram features
  • Under the 16,000,000 byte artifact cap
  • final_int6_sliding_window_exact val_bpb = 1.14620421

Notes:

  • This run is far from optimized.
  • It is being submitted mainly as an architectural reference point for recurrent-depth / looped-transformer design under the Parameter Golf constraints.
  • There is still clear room for sweeps on loop geometry, shared-vs-untied allocation, attention cadence, optimizer settings, and batch/schedule choices.
  • If anyone wants to push this direction further, I would strongly encourage running proper sweeps on top of this baseline rather than treating this result as tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants