This page summarizes the public run record currently visible in the upstream challenge repository.

Important scope note

The public record is still sparse. At the moment, it mainly shows:

  • a reference baseline for the main 10-minute / 16 MB track
  • an unlimited-compute non-record extension of roughly the same baseline family

So this page should be read as an early history page, not as a mature leaderboard chronicle.

Public runs visible in the repository snapshot

RunTrackScore (val_bpb)Total bytesWhat it establishes
Naive BaselineMain leaderboard1.224415,863,489A conventional small tied-embedding transformer can fit the artifact cap and produce a credible baseline score.
4-Hour Quasi-10B SP1024Non-record, unlimited compute1.207415,810,161Longer training on essentially the same artifact family improves score, but does not by itself rewrite the problem.

What the public record already suggests

Even this tiny sample supports a few cautious conclusions:

1. The challenge is immediately artifact-centric

Both visible runs sit just under the 16,000,000 byte cap, which reinforces the point made in Constraints and scoring: the challenge is not about nominal parameter count alone.

2. The first public reference point is intentionally simple

The public main-track record is a baseline-style run, not an exotic architecture manifesto. That matters because it gives future submissions a clean anchor.

3. Extra training helps, but the public evidence is still narrow

The unlimited-compute run is better than the main-track baseline, but it is still the same broad model family. Publicly, we do not yet have a recurrence-heavy, tokenizer-heavy, or evaluation-time-compute-heavy submission to compare against it.

What is still missing from the public record

There is not yet enough disclosed evidence here to rank, with confidence, the public viability of:

Those lanes are strongly suggested by the challenge framing and literature, but they are not yet represented by clearly public, leaderboard-facing run writeups in this snapshot.

Best way to read this page

Use the individual run pages for the detailed distinction between:

  • hard public facts
  • reasonable interpretation
  • unknowns the record does not settle

Individual run pages

2 items under this folder.