Research Frontiers

This layer is for frontier synthesis, not generic paper summaries. Each note below tries to connect multiple recent papers into a falsifiable claim about where the next non-obvious Parameter Golf gains may come from.

The goal is to answer:

what research seam looks underexploited now, and what concrete prediction would let us kill or promote it quickly?

How to use this layer

Start here when paper notes feel too local and lane pages feel too broad.
Treat each frontier as a cross-paper mechanism thesis.
Prefer notes that say what would disconfirm the idea, not just why it sounds exciting.

Highest-leverage seams right now

1. Byte allocation beats average bit-width

Why it matters: the newest low-bit papers increasingly win by deciding which parameters deserve protection, not by globally lowering quantization error. This is the strongest bridge from pQuant, PTQ1.61, MicroScopiQ, and ClusComp.

Best fit with existing graph: quantization and outliers, sparse outlier preservation, Decoupled precision.

2. Compression interfaces for shared depth

Why it matters: recent recursion papers and recent low-bit-stability papers point at the same hidden problem: repeated blocks become much more viable if their inputs and role shifts are explicitly normalized and lightly conditioned. This is where Extra RMSNorm, QuEST, Relaxed Recursive Transformers, MoEUT, and Fine-grained Parameter Sharing start to rhyme.

Best fit with existing graph: recursive sharing, recursive width scaling, RMSNorm stabilized scaling.

3. Tokenizer-head co-design under a hard cap

Why it matters: the recent tokenizer papers do not say “smaller token count wins.” They say tokenization must be judged jointly with LM-head cost, logits cost, and domain fit. That makes ReTok, Vocabulary Compression, Beyond Text Compression, and Plan Early much more relevant than a standard tokenizer discussion would suggest.

Best fit with existing graph: tokenizer and vocabulary efficiency, training economics, Tokenizer efficiency.

4. Entropy-friendly model structure

Why it matters: some compression ideas look good in nominal bit-width and still lose once metadata and final coding are counted. The more interesting seam is whether we can design model structure that a downstream codec naturally likes: repeated bases, clustered values, shared blocks, and low-entropy exception patterns.

Best fit with existing graph: Additive Quantization, ClusComp, Fine-grained Parameter Sharing, quantization and outliers, recursive sharing.

Why it matters: Inference Scaling Laws and Plan Early suggest that once storage is the bottleneck, extra evaluation-time compute can act like recovered capacity. For Parameter Golf, the frontier question is whether a compact recurrent model can use bounded extra passes to reconstruct some of the behavior that would otherwise have required more stored weights.

Best fit with existing graph: evaluation-time compute, recurrent wide architecture, recursive sharing.

6. Rate-distortion for artifact caps

Why it matters: Radio, OWQ, and ReALLM all imply that the strongest next wins may come from explicit byte-return accounting rather than nominal low-bit branding. This is the cleanest current bridge between information-theoretic allocation and concrete protected-structure design.

Best fit with existing graph: quantization and outliers, byte allocation beats average bit-width, entropy-friendly model structure.

7. Learned weight codecs and compressible training

Why it matters: the newest papers are starting to move beyond post-hoc low-bit tricks toward optimizer choices, training objectives, and learned weight representations that directly target future compressibility. This is the strongest current bridge from “better quantizer” to “better artifact ontology.”

Best fit with existing graph: moonshots, training economics, rate-distortion for artifact caps.

Ranking by evidence quality

Strongest evidence density

Most upside if true

Highest risk of overfitting or complexity blow-up

A useful reading order

That order now moves from explicit byte-allocation logic toward learned artifact formats, then on to architectural and compute-for-storage ideas.

What would count as real progress

A frontier note is doing its job if it helps us say one of the following:

“this cross-paper thesis predicts a measurable win in our local benchmark”
“this only sounds good until metadata or wall-clock constraints from the challenge are counted”
“this lane should be split because two mechanisms that looked related are actually in tension”

Parameter Golf Research Garden

Section Tree

Research Frontiers

How to use this layer

Highest-leverage seams right now

1. Byte allocation beats average bit-width

2. Compression interfaces for shared depth

3. Tokenizer-head co-design under a hard cap

4. Entropy-friendly model structure

5. Refinement loops as decompression

6. Rate-distortion for artifact caps

7. Learned weight codecs and compressible training

Ranking by evidence quality

Strongest evidence density

Most upside if true

Highest risk of overfitting or complexity blow-up

A useful reading order

What would count as real progress

Learned Weight Codecs and Compressible Training

Rate-Distortion for Artifact Caps

Byte Allocation Beats Average Bit-Width

Compression Interfaces for Shared Depth

Entropy-Friendly Model Structure

Refinement Loops as Decompression

Tokenizer-Head Co-Design Under a Hard Cap

Graph View

Section Tree

Research Frontiers

How to use this layer

Highest-leverage seams right now

1. Byte allocation beats average bit-width

2. Compression interfaces for shared depth

3. Tokenizer-head co-design under a hard cap

4. Entropy-friendly model structure

5. Refinement loops as decompression

6. Rate-distortion for artifact caps

7. Learned weight codecs and compressible training

Ranking by evidence quality

Strongest evidence density

Most upside if true

Highest risk of overfitting or complexity blow-up

A useful reading order

What would count as real progress

Related layers

Graph View