Public Runs

This page summarizes the public run record currently visible in the upstream challenge repository and public tracker.

Important scope note

This page now has to separate two different public surfaces:

the official accepted surface visible on the upstream main branch
the open and recently closed PR frontier, which is larger, faster-moving, and not always directly comparable

As of March 20, 2026, the public tracker exposed:

14 official main-track record folders
1 official non-record folder
133 open PR-backed public submissions
36 recently closed PR-backed public submissions

That means the public record is no longer sparse. It is simply uneven: the official accepted board is still relatively narrow, while the PR frontier is much wider and noisier.

Official main-track runs visible on `main`

These are the accepted main-track runs publicly visible on the upstream main branch as of March 20, 2026.

Run	Date	Score (`val_bpb`)	Author	Public motif
10L Int5-MLP + BigramHash(10240)	2026-03-20	`1.1428`	`thwu1`	int5 quantization + optimizer tuning + token/output-side hashing
Int6 MLP3x + SmearGate + BigramHash	2026-03-20	`1.1458`	`Raahil Shah`	int6 widening + stabilization + BigramHash
11L MLP3x + Int6 QAT	2026-03-20	`1.1502`	`aruniyer`	QAT-heavy stack
SmearGate + OrthoInit + Muon WD	2026-03-19	`1.1556`	`aquariouseworkman`	stabilization + mixed-precision discipline
10L Int6 QAT + Zstd MLP2.6x	2026-03-19	`1.1586`	`yahya010`	quantization + schedule + compression
Mixed Quant + Sliding Window Eval	2026-03-19	`1.1630`	`aquariouseworkman`	mixed precision + eval protocol
Muon WD + 10 layer	2026-03-19	`1.1748`	`notapplica`	optimizer / init tuning
Sliding Window Eval	2026-03-19	`1.1925`	`Matthew Li`	evaluation protocol
Lora TTT	2026-03-19	`1.1928`	`samacqua`	test-time training
4k seq length	2026-03-19	`1.2014`	`Spokane Way`	long-context
2048 seq length	2026-03-18	`1.2060`	`Spokane Way`	long-context
int6 mixed precision	2026-03-18	`1.2147`	`Nan Liu`	mixed-precision quantization
fp16 Embed	2026-03-18	`1.2197`	`Renier Velazco`	output-side / embedding precision
Naive Baseline	2026-03-18	`1.2244`	`Baseline`	baseline tied-embedding transformer

Official non-record run visible on `main`

Run	Date	Score (`val_bpb`)	Author	What it establishes
4-Hour Quasi-10B SP1024	2026-03-18	`1.2074`	`Will DePue`	Longer training on roughly the same artifact family improves score, but does not by itself rewrite the problem.

README spotlight versus full accepted surface

One easy way to misread the public record is to treat the README leaderboard as the full official board.

As of March 20, 2026:

the README-highlighted list showed 8 spotlighted entries
the accepted main branch record surface exposed 14 main-track record folders

That difference matters because the README is a curated spotlight, not a complete public history.

Open PR frontier: selected examples

The PR frontier is now large enough that it should be tracked separately from the accepted board.

Selected examples visible on March 20, 2026:

Public claim	Score claim	Why it matters
Record Update: val-only + standard	`0.9695` / `1.1465`	Shows how strong the frontier can look once val-only comparisons enter the picture.
SOTA Attempt: Paid prefix	`1.0217`	Makes explicit that prefix-style memory/payment schemes are now public, not just speculative.
8L Paid Prefix + SmearGate + Int6	`1.0539`	Extends the prefix idea into a more compression-aware low-bit stack.
FarnsworthEngine v1 — TTT + 11L Int6 MLP3x	`1.1303`	Confirms that test-time training remains a real public family.
11L + Efficient Partial XSA	`1.1307`	Suggests the public frontier is still searching over attention/eval structure, not just quantization knobs.
11L Int6 + WD=0.04 + SWA + FA3	`1.1318`	Typical of the mature “quantization + optimizer + eval protocol” stack now common in open PRs.

What the public record now suggests

1. The accepted board is no longer baseline-only

The main board is now a genuine compact-model competition surface, not just a baseline plus one or two follow-ups.

2. The public frontier is broader than the accepted board

The open PR stream contains ideas that are either stronger numerically, harder to compare, or simply not yet normalized into the accepted record.

3. Comparability is part of the challenge now

Val-only runs, altered evaluation paths, and PR-side experimentation mean that “lowest visible number” is not the same as “cleanest accepted main-track result.”

4. Quantization stacks dominate the official surface

The accepted board is heavily shaped by low-bit quantization, schedule tuning, token/output-side tricks, and stabilization choices.

5. Recurrence is public, but not yet officially central

Recurrent or shared-depth ideas are visible in PRs and fork deltas, but the accepted main board is still more quantization-stack-heavy than recurrence-heavy.

Best way to read this page

Use the individual run pages for the detailed distinction between:

hard public facts
reasonable interpretation
unknowns the record does not settle

Parameter Golf Research Garden

Section Tree

Public Runs

Important scope note

Official main-track runs visible on `main`

Official non-record run visible on `main`

README spotlight versus full accepted surface

Open PR frontier: selected examples

What the public record now suggests

1. The accepted board is no longer baseline-only

2. The public frontier is broader than the accepted board

3. Comparability is part of the challenge now

4. Quantization stacks dominate the official surface

5. Recurrence is public, but not yet officially central

Best way to read this page

Individual run pages

4-Hour Quasi-10B SP1024

Naive Baseline

Graph View

Parameter Golf Research Garden

Section Tree

Public Runs

Important scope note

Official main-track runs visible on main

Official non-record run visible on main

README spotlight versus full accepted surface

Open PR frontier: selected examples

What the public record now suggests

1. The accepted board is no longer baseline-only

2. The public frontier is broader than the accepted board

3. Comparability is part of the challenge now

4. Quantization stacks dominate the official surface

5. Recurrence is public, but not yet officially central

Best way to read this page

Individual run pages

Related

4-Hour Quasi-10B SP1024

Naive Baseline

Graph View

Official main-track runs visible on `main`

Official non-record run visible on `main`