Sources: arXiv:2503.13089 · alphaXiv overview
Core contribution
ClusComp reframes model compression around clustering and shared representatives rather than only scalar bit-width reduction. The paper’s most useful claim for this garden is that modern LLMs are increasingly hard to quantize because outliers dominate error, and a clustering-style representation can preserve more of the salient structure than uniform quantization.
Why this matters for Parameter Golf
ClusComp is unusually valuable because it bridges two lanes that are often treated separately:
The paper is nominally about compression, but the mechanism is also a form of structured reuse. That makes it relevant both to “how do we quantize?” and to “how much uniqueness do we really need?”
What to import
- Outliers are not a corner case. They can dominate compression failure.
- Shared representatives can be better than uniform scalar bins. Clustering buys expressivity by allocating bits to structure rather than to equal treatment.
- Compression and finetuning interact. The paper treats them as a continuous design problem rather than separate phases.
What not to over-import
ClusComp does not automatically imply that codebooks or cluster assignments are cheap enough for a hard 16 MB artifact budget. It also does not prove that the same cluster structure that helps standard inference will be easy to implement in a highly constrained submission format. The important import is the lens: non-uniformity and reuse are often the right abstraction together.
Best synthesis links
- Strengthens sparse outlier preservation by arguing that a minority of structure drives most of the damage.
- Sits naturally beside decoupled precision, since both reject “treat every parameter the same.”
- Offers a compression-side counterpart to Fine-grained Parameter Sharing, where structure and reuse matter more than naive tying.
Parameter Golf translation
This paper motivates experiments that ask:
- should some tensors be clustered or shared instead of merely quantized?
- can clustering logic identify the same sensitive subsets targeted by pQuant?
- when does structured reuse buy more than a slightly wider uniformly quantized model?
Related
- pQuant
- Additive Quantization
- Fine-grained Parameter Sharing
- Quantization and outliers
- Recursive and shared-parameter architectures
- Sparse outlier preservation
- Decoupled precision