QuaRot

(Ashkboos et al., 2024)

Sources: arXiv:2404.00456 · alphaXiv overview

Core contribution

QuaRot proposes rotating internal representations so that hidden-state outliers are dispersed without changing the model’s function. By changing basis in carefully chosen places, it makes end-to-end 4-bit quantization of weights, activations, and KV cache much easier while avoiding explicit high-precision exception channels.

Why this matters for Parameter Golf

QuaRot is interesting because it attacks the outlier problem from a different angle than papers like pQuant or AWQ. Instead of protecting the problematic subset directly, it tries to reshape the representation so the subset stops being so problematic. That is a useful conceptual alternative for this garden.

What to import

Outliers can sometimes be removed by basis changes, not only by exceptions.
Uniform low-bit quantization becomes more viable when the representation is better conditioned.
The right transformation can simplify the whole compression stack.

What not to over-import

QuaRot targets 4-bit end-to-end inference, not necessarily the much harsher artifact regime or exact runtime constraints explored here. Rotations may also introduce implementation complexity or interact awkwardly with other model changes. The import is strategic: some outlier problems are representation-design problems rather than saliency-bookkeeping problems.

Best synthesis links

Offers a contrast class to AWQ and pQuant.
Fits naturally with normalization before projections, since both aim to improve the conditioning of what low-bit operations see.
Gives quantization and outlier handling a second major family of solutions: reshape the distribution versus preserve exceptions.

Parameter Golf translation

QuaRot motivates asking:

can a cheap basis change make a tensor quantize cleanly enough that exception machinery is unnecessary?
when is it better to normalize or rotate away the problem rather than protect a subset of values?
could representation conditioning compose with selective precision instead of replacing it?

Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Cameron, P., Jaggi, M., Alistarh, D., Hoefler, T., & Hensman, J. (2024). QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs. arXiv Preprint arXiv:2404.00456. https://arxiv.org/abs/2404.00456

Parameter Golf Research Garden

Section Tree

QuaRot

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

QuaRot

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes