The challenge rules already imply a research agenda. Public literature and visible challenge work mostly sharpen that agenda rather than replace it.

The five big directions the challenge naturally creates

1. Shared depth and recurrent structure

The byte cap strongly favors models that reuse a small amount of structure many times.

Why it matters:

  • buys depth without paying for fully unique blocks
  • turns compute into effective capacity
  • pairs naturally with evaluation-time unrolling

Best supporting pages:

2. Quantization-aware robustness and outlier control

If the scored path runs through a compressed artifact, training must produce weights that survive that path.

Why it matters:

  • post-roundtrip quality is the real target
  • outliers and brittle scales can waste bytes and damage quality
  • small normalization or scaling changes can matter more than larger architectural changes

Best supporting pages:

3. Selective precision instead of uniform precision

A fixed artifact budget rarely wants perfectly uniform treatment of every tensor.

Why it matters:

  • a small protected subset may buy more than globally raising precision
  • mixed artifacts can target the truly sensitive parts of the model
  • this reframes compression as allocation rather than only shrinkage

Best supporting pages:

4. Tokenizer and output-head efficiency

The challenge metric is bits per byte, not bits per token. That makes the frontend and output side more central than many model builders expect.

Why it matters:

  • sequence length interacts with evaluation cost
  • vocabulary choices affect both tokenizer assets and output-layer size
  • the output side can dominate bytes in small models

Best supporting pages:

5. Evaluation-time compute

Parameter Golf leaves room for methods that store less but compute more at evaluation time, as long as the run stays within the cap.

Why it matters:

  • capability may be cheaper to recompute than to store
  • recurrent cores get much more attractive when extra unrolling is legal
  • tiny models may benefit disproportionately from carefully budgeted extra passes

Best supporting pages:

What looks crowded already

These directions already have strong public motivation and visible competition:

  • shared-depth / recurrent architectures
  • low-bit robustness and export-aware training
  • basic parameter tying and width-for-depth trades

That does not make them exhausted. It means incremental work there has to be more specific and more empirically disciplined.

What still looks relatively open

Some areas still appear underexplored or at least less settled in the public record:

  • artifact formats that go beyond simple dense low-bit storage
  • selective precision schemes with clean byte accounting
  • tokenizer redesign under the exact challenge objective
  • evaluation-time latent refinement that is trained for the score rather than bolted on later
  • cross-layer basis sharing with extremely byte-conscious metadata

The most important interaction effects

The best challenge ideas are usually not isolated tricks. They compound across lanes:

  • recurrence + eval-time compute can buy capability without storing more unique blocks
  • normalization + outlier control can make low-bit export much more forgiving
  • tokenizer efficiency + smaller output head can free bytes for the core model
  • shared structure + selective precision can spend expensive bytes only where sharing hurts most

How to use this page

If you want a challenge-level orientation:

  1. read Constraints and scoring
  2. read History and public runs
  3. use the atlas to see how the lanes fit together
  4. then drop into the relevant lane pages and paper notes