A recurring mistake in compression discussion is to compare methods mainly by nominal bit-width or reconstruction quality before the final artifact format is counted.

The more interesting seam across Additive Quantization, ClusComp, Fine-grained Parameter Sharing, PTQ1.61, and MicroScopiQ is this:

the winning model may be the one whose structure is easiest for the whole storage pipeline to exploit, not the one with the prettiest local quantizer.

Why this seam matters now

Several recent papers are converging on “non-uniformity,” but they do not all mean the same thing.

Put together, these papers point past “low-bit weights” toward a different question:

what kinds of model structure naturally compress well once values, indices, bases, masks, and repeated patterns all enter the artifact?

The key synthesis

There are at least three kinds of useful regularity:

1. Value regularity

Weights fall onto a small set of repeated or codebook-like values.

Natural paper bridge:

2. Basis regularity

Multiple tensors are explained by a small shared basis plus cheap corrections.

Natural paper bridge:

3. Exception regularity

The “special cases” that need better treatment form patterns that can themselves be compressed well.

Natural paper bridge:

The frontier claim is that these three regularities may compose better than they look in separate literatures.

A falsifiable thesis

Thesis: under a strict size cap, models whose deviations from cheap structure are regular and clustered will beat models with slightly better local quantization error but more irregular metadata.

In other words, artifact success may depend more on entropy of exceptions than on average quantization error.

What would support it

  • clustered or basis-based exception formats beat equally sized random sparse exceptions after final compression
  • repeated bases and repeated codebooks survive the artifact pipeline unusually well
  • methods with modestly worse pre-artifact error win after the full codec is applied

What would falsify it

  • final compression tracks nominal quantizer quality closely, with little bonus for regular structure
  • metadata costs stay negligible even for irregular exception patterns
  • regularity constraints hurt model quality too much to pay back

The strongest new idea hiding here

A promising Parameter Golf direction is storage-native model design.

That means we stop thinking of compression as the final stage applied to a finished model. Instead we ask whether the model can be designed so its learned structure already looks favorable to downstream coding.

Examples of what that might mean:

  • a shared recurrent block with tiny regular phase adapters instead of many unique layers
  • grouped or codebook-like residuals whose indices repeat heavily
  • protected subsets chosen in coarse blocks or rows rather than arbitrary masks
  • basis-plus-delta decompositions where both the basis and the correction formats repeat across tensors

This is where quantization and outliers meets recursive sharing in a way the current graph only hints at.

Why this frontier is different from “use better compression”

“Use better compression” usually means a smarter algorithm at the end. This frontier instead says the model should be judged by how well it cooperates with compression.

That shifts the research question from:

  • which post-hoc codec is best?

to:

  • what learned structure keeps both values and metadata low-entropy?

The second question is much closer to a true Parameter Golf objective.

Important cautions

This seam is easy to romanticize. It can fail in boring ways:

  • codebooks and indices may cost more than they save
  • regularity constraints may destroy important rare structure
  • the “entropy-friendly” model may become harder to train than a less structured baseline
  • gains may be specific to one artifact pipeline and not robust

So the point is not to assume regularity helps. The point is to measure whether regularity survives accounting.

Experiments this frontier suggests

  1. compare coarse-block exceptions against fine-grained sparse exceptions at equal final bytes
  2. measure pre-artifact error versus post-artifact score to find ranking reversals
  3. test basis-plus-delta formats where the basis is shared across layers or tensor families
  4. compare repeated small codebooks against larger tensor-specific codebooks
  5. analyze whether shared-depth models create more compressible repetition than non-shared models with the same score before compression

Bottom line

The important question may no longer be “how few bits per weight?”

It may be:

how much useful regularity can the whole model expose to the storage pipeline without giving away too much quality?

Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D. (2024). Extreme Compression of Large Language Models via Additive Quantization. arXiv Preprint arXiv:2401.06118. https://arxiv.org/abs/2401.06118
Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089
Ramachandran, A., Kundu, S., & Krishna, T. (2024). MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization. arXiv Preprint arXiv:2411.05282. https://arxiv.org/abs/2411.05282
Üyük, C., Lasby, M., Yassin, M., Evci, U., & Ioannou, Y. (2024). Learning Parameter Sharing with Tensor Decompositions and Sparsity. arXiv Preprint arXiv:2411.09816. https://arxiv.org/abs/2411.09816
Zhao, J., Zhang, M., Wang, M., Shang, Y., Zhang, K., Guan, W., Wang, Y., & Zhang, M. (2025). PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models. arXiv Preprint arXiv:2502.13179. https://arxiv.org/abs/2502.13179