(Zhang et al., 2026)

Sources: arXiv:2602.22592 · alphaXiv overview

Core contribution

pQuant argues that extremely low-bit language models fail partly because they force all parameters through the same cheap path. Its central intervention is a decoupled design in which most parameters stay low-bit while a tiny high-precision branch preserves the most sensitive structure.

Why this matters for Parameter Golf

This is the clearest single paper behind sparse outlier preservation. It gives a strong conceptual answer to a core Parameter Golf question: if some byte headroom remains, where should it go? pQuant’s answer is not “spread it evenly” but “spend it where uniform compression is most destructive.”

What to import

  • Parameter sensitivity is extremely uneven.
  • A small protected path can beat a uniformly better cheap path.
  • Asymmetry is a feature, not a hack. A compact system can keep most weights brutally cheap and still preserve a narrow expressive rescue route.

What not to over-import

The exact branch design in the paper may not be the best implementation for this repo or challenge. Side channels can bring indexing, metadata, and systems complexity. The durable lesson is broader: the minority of fragile structure matters more than the average parameter.

Parameter Golf translation

pQuant motivates designs such as:

  • sparse residual tensors for the worst compression errors
  • protected rows or channels in the most sensitive projections
  • tiny high-precision branches that rescue capacity without upgrading the whole model
Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592