Parameter Golf Research Garden

Research

Home
Challenge
Public history
Experiments
Lanes
Hypotheses
Frontiers
Moonshots
Ideas
Papers

Maintenance

Atlas
Map
Meta
Reports

❯

❯

Output Head Compression

Output-Head Compression

Mar 19, 20262 min read

hypothesis
tokenizer
compression

Hypothesis

In compact language models, the LM head and vocabulary path may consume enough bytes and compute that compressing or restructuring them buys more than a modest improvement in the transformer backbone.

Why this is plausible

Three facts point in the same direction:

tokenizer choice affects how large the vocabulary and logits path must be
compact models can hit vocab and LM-head bottlenecks surprisingly early
a backbone improvement is partly wasted if the output path remains the dominant cost center

This is the natural hypothesis version of The LM head is part of the compression problem.

Candidate mechanisms

smaller or more targeted vocabularies
factorized or low-rank output projections
compression schemes that treat the LM head separately from the rest of the model
tying, clustering, or codebook-like structure in the output path

What would support it

the same storage budget yielding better end performance when bytes move from the LM head into the backbone or vice versa
tokenizer changes altering the best backbone design more than expected
compact models whose main bottleneck turns out to be logits rather than hidden-state capacity

Main risks

token count rises enough to erase the savings
factorized or compressed heads damage rare-token behavior too much
gains depend on a tokenizer-data match that does not generalize

Related

Tokenizer and vocabulary efficiency
Training economics and small-model bottlenecks
Tokenizer efficiency
The LM head is part of the compression problem
Sparse outlier preservation

Graph View

Hypothesis
Why this is plausible
Candidate mechanisms
What would support it
Main risks
Related

Referenced by

graphMap of Content
hypothesesHypothesis Ledger
hypothesesSparse Outlier Preservation
ideasEntropy-Weighted Vocabulary Rescue
ideasHead-to-Depth Budget Swap
lanesResearch Lanes
lanesTokenizer and Vocabulary Efficiency
lanesTraining Economics and Small-Model Bottlenecks
notesThe LM Head Is Part of the Compression Problem
notesTokenizer Efficiency
papersThe LM Head is a Gradient Bottleneck
papersVQ-Logits
reportsHighest-Value Backlink Opportunities
reportsExecutive Link-Densification Audit
reportsMissing Bridge Page Audit
reportsQuartz Structural Proposal for the Parameter Golf Garden

Recent notes

challenge history
Public Runs
Mar 20, 2026
challenge
History and Public Runs
Mar 20, 2026
challenge
Public Research Directions
Mar 20, 2026
papers
Paper Index
Mar 20, 2026
- hub
- papers
papers
The LM Head is a Gradient Bottleneck
Mar 20, 2026
papers
Mamba-PTQ
Mar 20, 2026
papers
Titans
Mar 20, 2026
papers
Transformers are SSMs
Mar 20, 2026

Created with Quartz v4.5.2 © 2026

Challenge Repo
Site Source