Sources: arXiv:2602.22592 · alphaXiv overview
Core contribution
pQuant argues that extremely low-bit language models fail partly because they force all parameters through the same cheap path. Its central intervention is a decoupled design in which most parameters stay low-bit while a tiny high-precision branch preserves the most sensitive structure.
Why this matters for Parameter Golf
This is the clearest single paper behind sparse outlier preservation. It gives a strong conceptual answer to a core Parameter Golf question: if some byte headroom remains, where should it go? pQuant’s answer is not “spread it evenly” but “spend it where uniform compression is most destructive.”
What to import
- Parameter sensitivity is extremely uneven.
- A small protected path can beat a uniformly better cheap path.
- Asymmetry is a feature, not a hack. A compact system can keep most weights brutally cheap and still preserve a narrow expressive rescue route.
What not to over-import
The exact branch design in the paper may not be the best implementation for this repo or challenge. Side channels can bring indexing, metadata, and systems complexity. The durable lesson is broader: the minority of fragile structure matters more than the average parameter.
Best synthesis links
- Directly grounds decoupled precision.
- Strengthens outlier-aware compression by offering a concrete asymmetric mechanism.
- Connects naturally to AWQ, PTQ1.61, and MicroScopiQ, all of which reject uniform treatment in different ways.
Parameter Golf translation
pQuant motivates designs such as:
- sparse residual tensors for the worst compression errors
- protected rows or channels in the most sensitive projections
- tiny high-precision branches that rescue capacity without upgrading the whole model
Related
- Sparse outlier preservation
- Decoupled precision
- Outlier-aware compression
- AWQ
- QuEST
- PTQ1.61
- Quantization and outliers