Likely Strategy Families

This page is about likely strategy families, not settled winners.

That distinction matters. The public record is not yet rich enough to say which family is dominant. What we can say is which families the challenge rules, early public runs, and adjacent literature make most plausible.

Family 1: parameter reuse over stored uniqueness

The challenge strongly rewards methods that reduce the number of unique stored weights.

Why it looks promising:

artifact bytes are capped directly
the public README explicitly calls out parameter tying and depth recurrence
shared-depth methods naturally convert storage into compute

Best links:

Public-status note:

challenge-implied and literature-backed
not yet clearly represented by a public run folder in this snapshot

Family 2: compression-aware training rather than post-hoc compression only

The visible baseline runs already show a real pre-quant to post-roundtrip gap. That makes this family hard to ignore.

Why it looks promising:

the score is applied to the recovered artifact
longer training alone does not erase compression damage
protecting fragile tensors selectively may be more efficient than treating all tensors uniformly

Best links:

Public-status note:

partly supported by the behavior of public runs
not yet represented by a clearly documented public method family in the run archive

Family 3: tokenizer and head co-design

The baseline already shows one simple version of this family: a small vocabulary plus tied embeddings.

Why it looks promising:

vocab size and output-head cost hit the artifact budget directly
the challenge metric is tokenizer-agnostic bits per byte, which changes the usual tokenizer tradeoffs
a tokenizer can reduce sequence length while making output-side storage harder, so co-design matters

Best links:

Public-status note:

already weakly visible in baseline form
not yet publicly explored in a more aggressive or clearly novel way

Family 4: compute-for-bytes exchanges at evaluation time

The challenge explicitly allows bounded evaluation methods as long as they stay within the rules.

Why it looks promising:

evaluation can spend time to recover capability without storing more weights
this is especially attractive if the best static artifact is still too small to express everything directly

Best links:

Public-status note:

explicitly invited by challenge framing
not yet visibly demonstrated in the public runs summarized here

Family 5: training-budget exploitation without artifact redesign

This family is already visible in a minimal sense through the unlimited-compute non-record run.

Why it matters:

it asks how much quality can still be extracted from a fixed artifact family
it helps separate “need a better artifact” from “need a better optimization path”

Best links:

Public-status note:

publicly demonstrated in a narrow form
unlikely to be the whole story if artifact-centered methods improve

Summary judgment

If the public field remains sparse, the safest synthesis is:

the challenge already rewards artifact-aware discipline
the strongest-looking future families are those that trade stored uniqueness for either compute, selective precision, or better token/head economics
the public record is still too early to declare a dominant recipe

Parameter Golf Research Garden

Section Tree

Likely Strategy Families

Family 1: parameter reuse over stored uniqueness

Family 2: compression-aware training rather than post-hoc compression only

Family 3: tokenizer and head co-design

Family 4: compute-for-bytes exchanges at evaluation time

Family 5: training-budget exploitation without artifact redesign

Summary judgment

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Likely Strategy Families

Family 1: parameter reuse over stored uniqueness

Family 2: compression-aware training rather than post-hoc compression only

Family 3: tokenizer and head co-design

Family 4: compute-for-bytes exchanges at evaluation time

Family 5: training-budget exploitation without artifact redesign

Summary judgment

Related

Graph View

Table of Contents

Referenced by

Recent notes