Outlier-aware Compression

Observation

At moderate precision, uniform quantization can be good enough. At extreme compression, it often stops being the right abstraction.

The problem is not just average error. A small set of outlier values can dominate downstream degradation.

Evidence across papers

ClusComp argues that newer LLMs are getting harder to quantize largely because outliers are becoming more common. (Liao et al., 2025)
pQuant calls the failure mode parameter democratization: important parameters lose their special status under uniform low-bit treatment. (Zhang et al., 2026)
PTQ1.61 and MicroScopiQ show that post-training low-bit methods also end up needing structured ways to preserve salient channels or outlier values. (Ramachandran et al., 2024; Zhao et al., 2025)
Additive Quantization and ClusComp both point toward structured, non-uniform formats when scalar quantization becomes too lossy. (Egiazarian et al., 2024; Liao et al., 2025)

Practical lesson

When the budget is extremely tight, the right question is often not “what global bit-width should we use?” but:

which tiny subset of the model cannot survive the cheap path?

Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D. (2024). Extreme Compression of Large Language Models via Additive Quantization. arXiv Preprint arXiv:2401.06118. https://arxiv.org/abs/2401.06118

Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089

Ramachandran, A., Kundu, S., & Krishna, T. (2024). MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization. arXiv Preprint arXiv:2411.05282. https://arxiv.org/abs/2411.05282

Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592

Zhao, J., Zhang, M., Wang, M., Shang, Y., Zhang, K., Guan, W., Wang, Y., & Zhang, M. (2025). PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models. arXiv Preprint arXiv:2502.13179. https://arxiv.org/abs/2502.13179

Parameter Golf Research Garden

Section Tree

Outlier-aware Compression

Observation

Evidence across papers

Practical lesson

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Outlier-aware Compression

Observation

Evidence across papers

Practical lesson

Related

Graph View

Table of Contents

Referenced by

Recent notes