(Ashkboos et al., 2024)

Sources: arXiv:2404.00456 · alphaXiv overview

Core contribution

QuaRot proposes rotating internal representations so that hidden-state outliers are dispersed without changing the model’s function. By changing basis in carefully chosen places, it makes end-to-end 4-bit quantization of weights, activations, and KV cache much easier while avoiding explicit high-precision exception channels.

Why this matters for Parameter Golf

QuaRot is interesting because it attacks the outlier problem from a different angle than papers like pQuant or AWQ. Instead of protecting the problematic subset directly, it tries to reshape the representation so the subset stops being so problematic. That is a useful conceptual alternative for this garden.

What to import

  • Outliers can sometimes be removed by basis changes, not only by exceptions.
  • Uniform low-bit quantization becomes more viable when the representation is better conditioned.
  • The right transformation can simplify the whole compression stack.

What not to over-import

QuaRot targets 4-bit end-to-end inference, not necessarily the much harsher artifact regime or exact runtime constraints explored here. Rotations may also introduce implementation complexity or interact awkwardly with other model changes. The import is strategic: some outlier problems are representation-design problems rather than saliency-bookkeeping problems.

Parameter Golf translation

QuaRot motivates asking:

  • can a cheap basis change make a tensor quantize cleanly enough that exception machinery is unnecessary?
  • when is it better to normalize or rotate away the problem rather than protect a subset of values?
  • could representation conditioning compose with selective precision instead of replacing it?
Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Cameron, P., Jaggi, M., Alistarh, D., Hoefler, T., & Hensman, J. (2024). QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs. arXiv Preprint arXiv:2404.00456. https://arxiv.org/abs/2404.00456