(Liao et al., 2025)

Sources: arXiv:2503.13089 · alphaXiv overview

Core contribution

ClusComp reframes model compression around clustering and shared representatives rather than only scalar bit-width reduction. The paper’s most useful claim for this garden is that modern LLMs are increasingly hard to quantize because outliers dominate error, and a clustering-style representation can preserve more of the salient structure than uniform quantization.

Why this matters for Parameter Golf

ClusComp is unusually valuable because it bridges two lanes that are often treated separately:

The paper is nominally about compression, but the mechanism is also a form of structured reuse. That makes it relevant both to “how do we quantize?” and to “how much uniqueness do we really need?”

What to import

  • Outliers are not a corner case. They can dominate compression failure.
  • Shared representatives can be better than uniform scalar bins. Clustering buys expressivity by allocating bits to structure rather than to equal treatment.
  • Compression and finetuning interact. The paper treats them as a continuous design problem rather than separate phases.

What not to over-import

ClusComp does not automatically imply that codebooks or cluster assignments are cheap enough for a hard 16 MB artifact budget. It also does not prove that the same cluster structure that helps standard inference will be easy to implement in a highly constrained submission format. The important import is the lens: non-uniformity and reuse are often the right abstraction together.

Parameter Golf translation

This paper motivates experiments that ask:

  • should some tensors be clustered or shared instead of merely quantized?
  • can clustering logic identify the same sensitive subsets targeted by pQuant?
  • when does structured reuse buy more than a slightly wider uniformly quantized model?
Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089