This page summarizes the public run record currently visible in the upstream challenge repository and public tracker.
Important scope note
This page now has to separate two different public surfaces:
- the official accepted surface visible on the upstream
mainbranch - the open and recently closed PR frontier, which is larger, faster-moving, and not always directly comparable
As of March 20, 2026, the public tracker exposed:
- 14 official main-track record folders
- 1 official non-record folder
- 133 open PR-backed public submissions
- 36 recently closed PR-backed public submissions
That means the public record is no longer sparse. It is simply uneven: the official accepted board is still relatively narrow, while the PR frontier is much wider and noisier.
Official main-track runs visible on main
These are the accepted main-track runs publicly visible on the upstream main branch as of March 20, 2026.
| Run | Date | Score (val_bpb) | Author | Public motif |
|---|---|---|---|---|
| 10L Int5-MLP + BigramHash(10240) | 2026-03-20 | 1.1428 | thwu1 | int5 quantization + optimizer tuning + token/output-side hashing |
| Int6 MLP3x + SmearGate + BigramHash | 2026-03-20 | 1.1458 | Raahil Shah | int6 widening + stabilization + BigramHash |
| 11L MLP3x + Int6 QAT | 2026-03-20 | 1.1502 | aruniyer | QAT-heavy stack |
| SmearGate + OrthoInit + Muon WD | 2026-03-19 | 1.1556 | aquariouseworkman | stabilization + mixed-precision discipline |
| 10L Int6 QAT + Zstd MLP2.6x | 2026-03-19 | 1.1586 | yahya010 | quantization + schedule + compression |
| Mixed Quant + Sliding Window Eval | 2026-03-19 | 1.1630 | aquariouseworkman | mixed precision + eval protocol |
| Muon WD + 10 layer | 2026-03-19 | 1.1748 | notapplica | optimizer / init tuning |
| Sliding Window Eval | 2026-03-19 | 1.1925 | Matthew Li | evaluation protocol |
| Lora TTT | 2026-03-19 | 1.1928 | samacqua | test-time training |
| 4k seq length | 2026-03-19 | 1.2014 | Spokane Way | long-context |
| 2048 seq length | 2026-03-18 | 1.2060 | Spokane Way | long-context |
| int6 mixed precision | 2026-03-18 | 1.2147 | Nan Liu | mixed-precision quantization |
| fp16 Embed | 2026-03-18 | 1.2197 | Renier Velazco | output-side / embedding precision |
| Naive Baseline | 2026-03-18 | 1.2244 | Baseline | baseline tied-embedding transformer |
Official non-record run visible on main
| Run | Date | Score (val_bpb) | Author | What it establishes |
|---|---|---|---|---|
| 4-Hour Quasi-10B SP1024 | 2026-03-18 | 1.2074 | Will DePue | Longer training on roughly the same artifact family improves score, but does not by itself rewrite the problem. |
README spotlight versus full accepted surface
One easy way to misread the public record is to treat the README leaderboard as the full official board.
As of March 20, 2026:
- the README-highlighted list showed 8 spotlighted entries
- the accepted
mainbranch record surface exposed 14 main-track record folders
That difference matters because the README is a curated spotlight, not a complete public history.
Open PR frontier: selected examples
The PR frontier is now large enough that it should be tracked separately from the accepted board.
Selected examples visible on March 20, 2026:
| Public claim | Score claim | Why it matters |
|---|---|---|
| Record Update: val-only + standard | 0.9695 / 1.1465 | Shows how strong the frontier can look once val-only comparisons enter the picture. |
| SOTA Attempt: Paid prefix | 1.0217 | Makes explicit that prefix-style memory/payment schemes are now public, not just speculative. |
| 8L Paid Prefix + SmearGate + Int6 | 1.0539 | Extends the prefix idea into a more compression-aware low-bit stack. |
| FarnsworthEngine v1 — TTT + 11L Int6 MLP3x | 1.1303 | Confirms that test-time training remains a real public family. |
| 11L + Efficient Partial XSA | 1.1307 | Suggests the public frontier is still searching over attention/eval structure, not just quantization knobs. |
| 11L Int6 + WD=0.04 + SWA + FA3 | 1.1318 | Typical of the mature “quantization + optimizer + eval protocol” stack now common in open PRs. |
What the public record now suggests
1. The accepted board is no longer baseline-only
The main board is now a genuine compact-model competition surface, not just a baseline plus one or two follow-ups.
2. The public frontier is broader than the accepted board
The open PR stream contains ideas that are either stronger numerically, harder to compare, or simply not yet normalized into the accepted record.
3. Comparability is part of the challenge now
Val-only runs, altered evaluation paths, and PR-side experimentation mean that “lowest visible number” is not the same as “cleanest accepted main-track result.”
4. Quantization stacks dominate the official surface
The accepted board is heavily shaped by low-bit quantization, schedule tuning, token/output-side tricks, and stabilization choices.
5. Recurrence is public, but not yet officially central
Recurrent or shared-depth ideas are visible in PRs and fork deltas, but the accepted main board is still more quantization-stack-heavy than recurrence-heavy.
Best way to read this page
Use the individual run pages for the detailed distinction between:
- hard public facts
- reasonable interpretation
- unknowns the record does not settle