(Pierro & Abreu, 2024)

Sources: arXiv:2407.12397 · alphaXiv overview

Core contribution

Mamba-PTQ shows that Mamba-style recurrent LLMs still exhibit activation outlier channels and that naive post-training quantization degrades sharply when those outliers are ignored. The important result is not that Mamba quantization is “solved,” but that recurrent/state-space models do not escape the outlier problem by changing the sequence mixer.

Why this matters for Parameter Golf

This paper closes an easy loophole in compact-model intuition. It is tempting to think that if transformers quantize poorly, a recurrent or state-space alternative might be naturally cleaner. Mamba-PTQ says that is too optimistic: if we widen the architectural search into SSMs, we inherit another version of the same saliency and outlier problem.

What to import

  • Outlier handling remains central even in recurrent/state-space LMs.
  • Activation outliers, not only weight distributions, are the key quantization obstacle.
  • Alternative sequence mixers should be judged jointly with their compression path, not only with perplexity or runtime.

What not to over-import

This is preliminary quantization work and not a mature recipe for winning export pipelines. Its direct experimental results are more warning than solution. The lasting lesson is that architectural novelty does not excuse us from byte-aware quantization analysis.

Parameter Golf translation

If we explore SSM or recurrent candidates, we should ask immediately:

  • where are the activation outliers,
  • how hardware-friendly is any outlier mitigation,
  • and whether the resulting compression path is actually better than a more conventional transformer baseline.
Pierro, A., & Abreu, S. (2024). Mamba-PTQ: Outlier Channels in Recurrent Large Language Models. arXiv Preprint arXiv:2407.12397. https://arxiv.org/abs/2407.12397