Public Research Directions

The challenge rules already imply a research agenda. Public literature and visible challenge work mostly sharpen that agenda rather than replace it.

Current public snapshot: March 20, 2026

As of March 20, 2026, the public tracker and upstream repository expose a much denser signal surface than the early README alone suggests:

14 accepted main-track runs on main
1 accepted non-record run
133 open PR-backed public submissions
36 recently closed PR-backed public submissions
7 ahead-of-upstream fork refs

The pressure is not evenly distributed. The public surface is now most crowded around:

quantization
optimizer / schedule tuning
sliding-window evaluation
token/output-side tricks such as BigramHash-like vocabulary-head interventions

Smaller but still real public clusters exist around:

test-time training
long-context variants
output-head restructuring
recurrence / shared-depth ideas

That snapshot should change how this page is read. The question is no longer “what seems plausible from the literature?” but “which literature-backed ideas have already become public competition surfaces, and which still remain comparatively open?”

The five big directions the challenge naturally creates

1. Shared depth and recurrent structure

The byte cap strongly favors models that reuse a small amount of structure many times.

Why it matters:

buys depth without paying for fully unique blocks
turns compute into effective capacity
pairs naturally with evaluation-time unrolling

Best supporting pages:

2. Quantization-aware robustness and outlier control

If the scored path runs through a compressed artifact, training must produce weights that survive that path.

Why it matters:

post-roundtrip quality is the real target
outliers and brittle scales can waste bytes and damage quality
small normalization or scaling changes can matter more than larger architectural changes

Best supporting pages:

3. Selective precision instead of uniform precision

A fixed artifact budget rarely wants perfectly uniform treatment of every tensor.

Why it matters:

a small protected subset may buy more than globally raising precision
mixed artifacts can target the truly sensitive parts of the model
this reframes compression as allocation rather than only shrinkage

Best supporting pages:

4. Tokenizer and output-head efficiency

The challenge metric is bits per byte, not bits per token. That makes the frontend and output side more central than many model builders expect.

Why it matters:

sequence length interacts with evaluation cost
vocabulary choices affect both tokenizer assets and output-layer size
the output side can dominate bytes in small models

Best supporting pages:

5. Evaluation-time compute

Parameter Golf leaves room for methods that store less but compute more at evaluation time, as long as the run stays within the cap.

Why it matters:

capability may be cheaper to recompute than to store
recurrent cores get much more attractive when extra unrolling is legal
tiny models may benefit disproportionately from carefully budgeted extra passes

Best supporting pages:

What looks crowded already

These directions already have strong public motivation and visible competition as of March 20, 2026:

low-bit robustness and export-aware training
optimizer / schedule tuning attached to those low-bit stacks
sliding-window evaluation and related protocol-aware scoring improvements
token/output-side tricks that stay close to the existing transformer backbone
initialization and stabilization tweaks that improve quantization friendliness

That does not make them exhausted. It means incremental work there has to be more specific and more empirically disciplined.

What still looks relatively open

Some areas still appear underexplored or at least less settled in the public record:

officially accepted shared-depth / recurrent records; recurrence is visible in open PRs and fork deltas, but it is not yet the center of the accepted board
output-head or paid-prefix style memory schemes that survive the standard accepted evaluation path
tokenizer redesign aimed directly at the challenge metric rather than only token-side hashing layered on the current setup
evaluation-time memory or latent refinement that is comparable under the ordinary main-track path rather than only val-only or altered eval settings
artifact formats that go beyond dense low-bit storage while keeping the bookkeeping simple enough to be acceptable

The most important interaction effects

The best challenge ideas are usually not isolated tricks. They compound across lanes:

recurrence + eval-time compute can buy capability without storing more unique blocks
normalization + outlier control can make low-bit export much more forgiving
tokenizer efficiency + smaller output head can free bytes for the core model
shared structure + selective precision can spend expensive bytes only where sharing hurts most
test-time training + quantization stacks can create public-score jumps, but often at the cost of harder comparability

Newly public directions that deserve more weight

Three directions now deserve more emphasis than the older version of this page gave them:

1. Test-time adaptation is no longer hypothetical

The public surface now includes both an accepted LoRA TTT run and multiple stronger open-PR TTT variants. That means evaluation-time compute is not just about search or reranking anymore; it clearly includes adaptation during evaluation.

2. Paid-prefix or prefix-memory ideas are now visibly in play

Public PRs around paid-prefix schemes show that some competitors are explicitly trying to move capacity out of conventional trunk storage and into more structured externalized memory-like artifacts. Even if these do not become accepted records, they are now part of the public research map.

3. Recurrence is public, but still not officially proven

Shared-depth and recurrent ideas are no longer speculative from the literature alone. They are showing up in PRs and fork deltas. But because they are not yet dominant on the accepted board, they still read more like a live frontier than a settled lane.

How to use this page

If you want a challenge-level orientation:

read Constraints and scoring
read History and public runs
use the atlas to see how the lanes fit together
then drop into the relevant lane pages and paper notes

Parameter Golf Research Garden

Section Tree

Public Research Directions

Current public snapshot: March 20, 2026

The five big directions the challenge naturally creates

1. Shared depth and recurrent structure

2. Quantization-aware robustness and outlier control

3. Selective precision instead of uniform precision

4. Tokenizer and output-head efficiency

5. Evaluation-time compute

What looks crowded already

What still looks relatively open

The most important interaction effects

Newly public directions that deserve more weight

1. Test-time adaptation is no longer hypothetical

2. Paid-prefix or prefix-memory ideas are now visibly in play

3. Recurrence is public, but still not officially proven

How to use this page

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs