This page summarizes the public run record currently visible in the upstream challenge repository and public tracker.

Important scope note

This page now has to separate two different public surfaces:

  • the official accepted surface visible on the upstream main branch
  • the open and recently closed PR frontier, which is larger, faster-moving, and not always directly comparable

As of March 20, 2026, the public tracker exposed:

  • 14 official main-track record folders
  • 1 official non-record folder
  • 133 open PR-backed public submissions
  • 36 recently closed PR-backed public submissions

That means the public record is no longer sparse. It is simply uneven: the official accepted board is still relatively narrow, while the PR frontier is much wider and noisier.

Official main-track runs visible on main

These are the accepted main-track runs publicly visible on the upstream main branch as of March 20, 2026.

RunDateScore (val_bpb)AuthorPublic motif
10L Int5-MLP + BigramHash(10240)2026-03-201.1428thwu1int5 quantization + optimizer tuning + token/output-side hashing
Int6 MLP3x + SmearGate + BigramHash2026-03-201.1458Raahil Shahint6 widening + stabilization + BigramHash
11L MLP3x + Int6 QAT2026-03-201.1502aruniyerQAT-heavy stack
SmearGate + OrthoInit + Muon WD2026-03-191.1556aquariouseworkmanstabilization + mixed-precision discipline
10L Int6 QAT + Zstd MLP2.6x2026-03-191.1586yahya010quantization + schedule + compression
Mixed Quant + Sliding Window Eval2026-03-191.1630aquariouseworkmanmixed precision + eval protocol
Muon WD + 10 layer2026-03-191.1748notapplicaoptimizer / init tuning
Sliding Window Eval2026-03-191.1925Matthew Lievaluation protocol
Lora TTT2026-03-191.1928samacquatest-time training
4k seq length2026-03-191.2014Spokane Waylong-context
2048 seq length2026-03-181.2060Spokane Waylong-context
int6 mixed precision2026-03-181.2147Nan Liumixed-precision quantization
fp16 Embed2026-03-181.2197Renier Velazcooutput-side / embedding precision
Naive Baseline2026-03-181.2244Baselinebaseline tied-embedding transformer

Official non-record run visible on main

RunDateScore (val_bpb)AuthorWhat it establishes
4-Hour Quasi-10B SP10242026-03-181.2074Will DePueLonger training on roughly the same artifact family improves score, but does not by itself rewrite the problem.

README spotlight versus full accepted surface

One easy way to misread the public record is to treat the README leaderboard as the full official board.

As of March 20, 2026:

  • the README-highlighted list showed 8 spotlighted entries
  • the accepted main branch record surface exposed 14 main-track record folders

That difference matters because the README is a curated spotlight, not a complete public history.

Open PR frontier: selected examples

The PR frontier is now large enough that it should be tracked separately from the accepted board.

Selected examples visible on March 20, 2026:

Public claimScore claimWhy it matters
Record Update: val-only + standard0.9695 / 1.1465Shows how strong the frontier can look once val-only comparisons enter the picture.
SOTA Attempt: Paid prefix1.0217Makes explicit that prefix-style memory/payment schemes are now public, not just speculative.
8L Paid Prefix + SmearGate + Int61.0539Extends the prefix idea into a more compression-aware low-bit stack.
FarnsworthEngine v1 — TTT + 11L Int6 MLP3x1.1303Confirms that test-time training remains a real public family.
11L + Efficient Partial XSA1.1307Suggests the public frontier is still searching over attention/eval structure, not just quantization knobs.
11L Int6 + WD=0.04 + SWA + FA31.1318Typical of the mature “quantization + optimizer + eval protocol” stack now common in open PRs.

What the public record now suggests

1. The accepted board is no longer baseline-only

The main board is now a genuine compact-model competition surface, not just a baseline plus one or two follow-ups.

2. The public frontier is broader than the accepted board

The open PR stream contains ideas that are either stronger numerically, harder to compare, or simply not yet normalized into the accepted record.

3. Comparability is part of the challenge now

Val-only runs, altered evaluation paths, and PR-side experimentation mean that “lowest visible number” is not the same as “cleanest accepted main-track result.”

4. Quantization stacks dominate the official surface

The accepted board is heavily shaped by low-bit quantization, schedule tuning, token/output-side tricks, and stabilization choices.

5. Recurrence is public, but not yet officially central

Recurrent or shared-depth ideas are visible in PRs and fork deltas, but the accepted main board is still more quantization-stack-heavy than recurrence-heavy.

Best way to read this page

Use the individual run pages for the detailed distinction between:

  • hard public facts
  • reasonable interpretation
  • unknowns the record does not settle

Individual run pages

2 items under this folder.