AWQ

(Lin et al., 2024)

Sources: arXiv:2306.00978 · alphaXiv overview

Core contribution

AWQ argues that not all weights are equally important for low-bit inference and that saliency should be identified through activation statistics, not weight magnitude alone. Its practical insight is that protecting only a tiny fraction of salient channels can dramatically reduce quantization error, and that equivalent scaling transformations can preserve those channels without resorting to hardware-unfriendly mixed precision.

Why this matters for Parameter Golf

AWQ is one of the most directly relevant practical papers for sparse outlier preservation. It provides an actionable answer to a question that recurs across this garden: if only a sliver of the model really needs help, how do we find it and protect it cheaply?

What to import

Saliency is activation-mediated. Weight magnitude alone can miss what truly matters.
A tiny subset can dominate quantization damage.
Equivalent transformations can sometimes rescue important channels without explicit mixed-precision exceptions.

What not to over-import

AWQ relies on offline activation statistics and is aimed at practical LLM deployment rather than the exact constraints of this challenge. It does not prove that calibration-style saliency selection will transfer cleanly to every local benchmark or compressed artifact format. Still, its central observation is extremely reusable.

Best synthesis links

Grounds sparse outlier preservation with a more deployment-oriented mechanism than pQuant.
Connects to outlier-aware compression by explaining why the important subset should be found through activations.
Sits intriguingly beside QuaRot: AWQ protects salient channels, whereas QuaRot tries to remove outliers by changing basis.

Parameter Golf translation

AWQ suggests three useful questions:

which channels are repeatedly salient under the actual data distribution?
can they be protected by rescaling or equivalent transformations rather than explicit higher-precision storage?
when is a tiny calibration-informed intervention more byte-efficient than uniformly improving the whole model?

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv Preprint arXiv:2306.00978. https://arxiv.org/abs/2306.00978

Parameter Golf Research Garden

Section Tree

AWQ

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

AWQ

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes