This page is about likely strategy families, not settled winners.
That distinction matters. The public record is not yet rich enough to say which family is dominant. What we can say is which families the challenge rules, early public runs, and adjacent literature make most plausible.
Family 1: parameter reuse over stored uniqueness
The challenge strongly rewards methods that reduce the number of unique stored weights.
Why it looks promising:
- artifact bytes are capped directly
- the public README explicitly calls out parameter tying and depth recurrence
- shared-depth methods naturally convert storage into compute
Best links:
- recursive and shared-parameter architectures
- Recursive layer sharing
- Relaxed Recursive Transformers
- Fine-grained Parameter Sharing
- MoEUT
Public-status note:
- challenge-implied and literature-backed
- not yet clearly represented by a public run folder in this snapshot
Family 2: compression-aware training rather than post-hoc compression only
The visible baseline runs already show a real pre-quant to post-roundtrip gap. That makes this family hard to ignore.
Why it looks promising:
- the score is applied to the recovered artifact
- longer training alone does not erase compression damage
- protecting fragile tensors selectively may be more efficient than treating all tensors uniformly
Best links:
- quantization and outlier handling
- Outlier-aware compression
- Normalization before projections
- Extra RMSNorm
- pQuant
- QuEST
Public-status note:
- partly supported by the behavior of public runs
- not yet represented by a clearly documented public method family in the run archive
Family 3: tokenizer and head co-design
The baseline already shows one simple version of this family: a small vocabulary plus tied embeddings.
Why it looks promising:
- vocab size and output-head cost hit the artifact budget directly
- the challenge metric is tokenizer-agnostic bits per byte, which changes the usual tokenizer tradeoffs
- a tokenizer can reduce sequence length while making output-side storage harder, so co-design matters
Best links:
- tokenizer and vocabulary efficiency
- Tokenizer efficiency
- The LM head is part of the compression problem
- ReTok
- Vocabulary Compression
- Beyond Text Compression
Public-status note:
- already weakly visible in baseline form
- not yet publicly explored in a more aggressive or clearly novel way
Family 4: compute-for-bytes exchanges at evaluation time
The challenge explicitly allows bounded evaluation methods as long as they stay within the rules.
Why it looks promising:
- evaluation can spend time to recover capability without storing more weights
- this is especially attractive if the best static artifact is still too small to express everything directly
Best links:
- evaluation-time compute and inference scaling
- Compute-for-storage exchange
- Iterative refinement over stored depth
- Inference Scaling Laws
Public-status note:
- explicitly invited by challenge framing
- not yet visibly demonstrated in the public runs summarized here
Family 5: training-budget exploitation without artifact redesign
This family is already visible in a minimal sense through the unlimited-compute non-record run.
Why it matters:
- it asks how much quality can still be extracted from a fixed artifact family
- it helps separate “need a better artifact” from “need a better optimization path”
Best links:
- training economics and small-model bottlenecks
- Computational Bottlenecks of Training SLMs
- 4-Hour Quasi-10B SP1024
Public-status note:
- publicly demonstrated in a narrow form
- unlikely to be the whole story if artifact-centered methods improve
Summary judgment
If the public field remains sparse, the safest synthesis is:
- the challenge already rewards artifact-aware discipline
- the strongest-looking future families are those that trade stored uniqueness for either compute, selective precision, or better token/head economics
- the public record is still too early to declare a dominant recipe