Rate-Distortion for Artifact Caps

A lot of recent low-bit work can be re-read as one deeper claim:

the real optimization target is not model error at a nominal bit-width; it is recovered quality under a total byte budget.

That sounds obvious in Parameter Golf, but most methods still enter through a narrower door such as “3-bit PTQ,” “mixed precision,” or “outlier handling.” The strongest new seam is to unify those under rate-distortion thinking.

Why this seam matters now

Several recent papers are converging from different directions:

Radio makes the rate-distortion framing explicit. (Young, 2025)
OWQ shows that a tiny protected subset can dominate the quality curve. (Lee et al., 2024)
ReALLM widens the design space from quantized matrices to budgeted latent-plus-residual representations. (Leconte et al., 2024)
ClusComp and pQuant reinforce that saliency is steep and structure-aware protection matters.

The shared lesson is stronger than any one implementation: byte spending itself should be optimized as a first-class object.

The cross-paper connection

These papers differ in mechanics, but they rhyme on four points:

distortion is highly non-uniform across the model
the best protected bytes are usually a small minority
side-information and metadata can erase naive gains
structure that is easy to encode often beats theoretically nicer but irregular exceptions

That creates a more precise frontier than “mixed precision”:

can we estimate a byte-return curve for each kind of exception path and spend bytes only where the slope is steep enough?

What this means for Parameter Golf

A hard-cap challenge rewards methods that can answer all of these at once:

which object gets protected: tensor, row, column, channel, codebook, residual
how expensive the side-information is
whether the resulting structure still compresses well after final packing
whether the protected path can be amortized across repeated/shared blocks

This is why Radio and OWQ feel unusually relevant together: one supplies the allocation principle, the other supplies a plausible unit of protection.

A falsifiable thesis

Thesis: once models are already in the aggressive low-bit regime, the next meaningful gains will come from better byte-return ranking and budgeting, not from globally lowering nominal quantization error.

What would support it

equal-byte selective schemes beat globally better quantizers
the best protected subset is tiny and saturates quickly
structured exception formats survive final artifact coding better than irregular ones
repeated/shared backbones make protected side paths even more valuable because their cost amortizes

What would falsify it

saliency is too diffuse for small protected subsets to matter
ranking noise is too unstable across seeds or workloads
metadata costs erase the selective-precision win
execution constraints reward simpler uniform formats despite worse distortion

The interesting hidden connection

This frontier links the quantization lane to the recursive-sharing lane. If Dynamic Layer Tying or Relaxed Recursive Transformers reduce the number of unique stored blocks, the freed bytes can be reinvested into a far more selective protected path.

That composition may be stronger than either idea alone:

fewer unique stored blocks
richer exception handling where it matters
better total quality at the same final artifact size

Bottom line

The next serious seam is not just “find a better quantizer.” It is treat every extra stored byte like capital that must earn its return.

That framing is much closer to the actual challenge objective than comparing methods by advertised bit-width class.

Leconte, L., Bedin, L., Nguyen, V. M., & Moulines, E. (2024). ReALLM: A General Framework for LLM Compression and Fine-Tuning. arXiv Preprint arXiv:2405.13155. https://arxiv.org/abs/2405.13155

Lee, C., Jin, J., Kim, T., Kim, H., & Park, E. (2024). OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models. arXiv Preprint arXiv:2306.02272. https://arxiv.org/abs/2306.02272

Young, S. I. (2025). Radio: Rate-Distortion Optimization for Large Language Model Compression. arXiv Preprint arXiv:2505.03031. https://arxiv.org/abs/2505.03031

Parameter Golf Research Garden

Section Tree

Rate-Distortion for Artifact Caps

Why this seam matters now

The cross-paper connection

What this means for Parameter Golf

A falsifiable thesis

What would support it

What would falsify it

The interesting hidden connection

Bottom line

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Rate-Distortion for Artifact Caps

Why this seam matters now

The cross-paper connection

What this means for Parameter Golf

A falsifiable thesis

What would support it

What would falsify it

The interesting hidden connection

Bottom line

Related

Graph View

Table of Contents

Referenced by

Recent notes