Core idea
Extremely low-bit models often fail because they compress everything too uniformly. A recurring fix is to preserve a small amount of high-precision capacity for the parameters that matter most.
Why this keeps showing up
- pQuant explicitly frames the problem as parameter democratization and uses a tiny high-precision branch to restore sensitivity structure. (Zhang et al., 2026)
- ClusComp argues that outliers increasingly dominate the difficulty of compressing modern LLMs. (Liao et al., 2025)
- MicroScopiQ reaches a similar conclusion from a hardware-aware perspective: outliers need special treatment, but the treatment must still be deployment-friendly. (Ramachandran et al., 2024)
- PTQ1.61 shows that even post-training methods often succeed or fail based on how they isolate and protect salient structure. (Zhao et al., 2025)
Parameter Golf translation
The important insight is not “use fp16 somewhere.” It is:
- identify where uniform low-bit treatment is most destructive
- spend bytes only there
- keep the majority path extremely cheap
That is the core intuition behind sparse outlier preservation.
Design space
This can show up as:
- sparse residual deltas
- protected rows or tensors
- decoupled branches
- clustering or codebook methods that avoid scalar uniformity
Related
Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089
Ramachandran, A., Kundu, S., & Krishna, T. (2024). MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization. arXiv Preprint arXiv:2411.05282. https://arxiv.org/abs/2411.05282
Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592
Zhao, J., Zhang, M., Wang, M., Shang, Y., Zhang, K., Guan, W., Wang, Y., & Zhang, M. (2025). PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models. arXiv Preprint arXiv:2502.13179. https://arxiv.org/abs/2502.13179