Public Runs

This page summarizes the public run record currently visible in the upstream challenge repository.

Important scope note

The public record is still sparse. At the moment, it mainly shows:

a reference baseline for the main 10-minute / 16 MB track
an unlimited-compute non-record extension of roughly the same baseline family

So this page should be read as an early history page, not as a mature leaderboard chronicle.

Public runs visible in the repository snapshot

Run	Track	Score (`val_bpb`)	Total bytes	What it establishes
Naive Baseline	Main leaderboard	`1.2244`	`15,863,489`	A conventional small tied-embedding transformer can fit the artifact cap and produce a credible baseline score.
4-Hour Quasi-10B SP1024	Non-record, unlimited compute	`1.2074`	`15,810,161`	Longer training on essentially the same artifact family improves score, but does not by itself rewrite the problem.

What the public record already suggests

Even this tiny sample supports a few cautious conclusions:

1. The challenge is immediately artifact-centric

Both visible runs sit just under the 16,000,000 byte cap, which reinforces the point made in Constraints and scoring: the challenge is not about nominal parameter count alone.

2. The first public reference point is intentionally simple

The public main-track record is a baseline-style run, not an exotic architecture manifesto. That matters because it gives future submissions a clean anchor.

3. Extra training helps, but the public evidence is still narrow

The unlimited-compute run is better than the main-track baseline, but it is still the same broad model family. Publicly, we do not yet have a recurrence-heavy, tokenizer-heavy, or evaluation-time-compute-heavy submission to compare against it.

What is still missing from the public record

There is not yet enough disclosed evidence here to rank, with confidence, the public viability of:

Those lanes are strongly suggested by the challenge framing and literature, but they are not yet represented by clearly public, leaderboard-facing run writeups in this snapshot.

Best way to read this page

Use the individual run pages for the detailed distinction between:

hard public facts
reasonable interpretation
unknowns the record does not settle

Parameter Golf Research Garden

Section Tree

Public Runs

Important scope note

Public runs visible in the repository snapshot

What the public record already suggests

1. The challenge is immediately artifact-centric

2. The first public reference point is intentionally simple

3. Extra training helps, but the public evidence is still narrow

What is still missing from the public record

Best way to read this page

Individual run pages

4-Hour Quasi-10B SP1024

Naive Baseline

Graph View

Parameter Golf Research Garden

Section Tree

Public Runs

Important scope note

Public runs visible in the repository snapshot

What the public record already suggests

1. The challenge is immediately artifact-centric

2. The first public reference point is intentionally simple

3. Extra training helps, but the public evidence is still narrow

What is still missing from the public record

Best way to read this page

Individual run pages

Related

4-Hour Quasi-10B SP1024

Naive Baseline

Graph View