A recurring mistake in compression discussion is to compare methods mainly by nominal bit-width or reconstruction quality before the final artifact format is counted.
The more interesting seam across Additive Quantization, ClusComp, Fine-grained Parameter Sharing, PTQ1.61, and MicroScopiQ is this:
the winning model may be the one whose structure is easiest for the whole storage pipeline to exploit, not the one with the prettiest local quantizer.
Why this seam matters now
Several recent papers are converging on “non-uniformity,” but they do not all mean the same thing.
- Additive Quantization says extreme compression can favor codebooks over scalar quantization. (Egiazarian et al., 2024)
- ClusComp says clustering-like structure can beat uniform low-bit treatment when outliers dominate. (Liao et al., 2025)
- Fine-grained Parameter Sharing says shared bases plus sparse factors can outperform naive all-or-nothing tying. (Üyük et al., 2024)
- PTQ1.61 and MicroScopiQ both emphasize that saliency protection only works if its overhead stays controlled. (Ramachandran et al., 2024; Zhao et al., 2025)
Put together, these papers point past “low-bit weights” toward a different question:
what kinds of model structure naturally compress well once values, indices, bases, masks, and repeated patterns all enter the artifact?
The key synthesis
There are at least three kinds of useful regularity:
1. Value regularity
Weights fall onto a small set of repeated or codebook-like values.
Natural paper bridge:
2. Basis regularity
Multiple tensors are explained by a small shared basis plus cheap corrections.
Natural paper bridge:
- Fine-grained Parameter Sharing
- Relaxed Recursive Transformers
- Recursive and shared-parameter architectures
3. Exception regularity
The “special cases” that need better treatment form patterns that can themselves be compressed well.
Natural paper bridge:
The frontier claim is that these three regularities may compose better than they look in separate literatures.
A falsifiable thesis
Thesis: under a strict size cap, models whose deviations from cheap structure are regular and clustered will beat models with slightly better local quantization error but more irregular metadata.
In other words, artifact success may depend more on entropy of exceptions than on average quantization error.
What would support it
- clustered or basis-based exception formats beat equally sized random sparse exceptions after final compression
- repeated bases and repeated codebooks survive the artifact pipeline unusually well
- methods with modestly worse pre-artifact error win after the full codec is applied
What would falsify it
- final compression tracks nominal quantizer quality closely, with little bonus for regular structure
- metadata costs stay negligible even for irregular exception patterns
- regularity constraints hurt model quality too much to pay back
The strongest new idea hiding here
A promising Parameter Golf direction is storage-native model design.
That means we stop thinking of compression as the final stage applied to a finished model. Instead we ask whether the model can be designed so its learned structure already looks favorable to downstream coding.
Examples of what that might mean:
- a shared recurrent block with tiny regular phase adapters instead of many unique layers
- grouped or codebook-like residuals whose indices repeat heavily
- protected subsets chosen in coarse blocks or rows rather than arbitrary masks
- basis-plus-delta decompositions where both the basis and the correction formats repeat across tensors
This is where quantization and outliers meets recursive sharing in a way the current graph only hints at.
Why this frontier is different from “use better compression”
“Use better compression” usually means a smarter algorithm at the end. This frontier instead says the model should be judged by how well it cooperates with compression.
That shifts the research question from:
- which post-hoc codec is best?
to:
- what learned structure keeps both values and metadata low-entropy?
The second question is much closer to a true Parameter Golf objective.
Important cautions
This seam is easy to romanticize. It can fail in boring ways:
- codebooks and indices may cost more than they save
- regularity constraints may destroy important rare structure
- the “entropy-friendly” model may become harder to train than a less structured baseline
- gains may be specific to one artifact pipeline and not robust
So the point is not to assume regularity helps. The point is to measure whether regularity survives accounting.
Experiments this frontier suggests
- compare coarse-block exceptions against fine-grained sparse exceptions at equal final bytes
- measure pre-artifact error versus post-artifact score to find ranking reversals
- test basis-plus-delta formats where the basis is shared across layers or tensor families
- compare repeated small codebooks against larger tensor-specific codebooks
- analyze whether shared-depth models create more compressible repetition than non-shared models with the same score before compression
Bottom line
The important question may no longer be “how few bits per weight?”
It may be:
how much useful regularity can the whole model expose to the storage pipeline without giving away too much quality?
Related
- Byte allocation beats average bit-width
- Quantization and outliers
- Recursive and shared-parameter architectures
- Outlier-aware compression
- Recursive layer sharing