Mamba-PTQ

(Pierro & Abreu, 2024)

Sources: arXiv:2407.12397 · alphaXiv overview

Core contribution

Mamba-PTQ shows that Mamba-style recurrent LLMs still exhibit activation outlier channels and that naive post-training quantization degrades sharply when those outliers are ignored. The important result is not that Mamba quantization is “solved,” but that recurrent/state-space models do not escape the outlier problem by changing the sequence mixer.

Why this matters for Parameter Golf

This paper closes an easy loophole in compact-model intuition. It is tempting to think that if transformers quantize poorly, a recurrent or state-space alternative might be naturally cleaner. Mamba-PTQ says that is too optimistic: if we widen the architectural search into SSMs, we inherit another version of the same saliency and outlier problem.

What to import

Outlier handling remains central even in recurrent/state-space LMs.
Activation outliers, not only weight distributions, are the key quantization obstacle.
Alternative sequence mixers should be judged jointly with their compression path, not only with perplexity or runtime.

What not to over-import

This is preliminary quantization work and not a mature recipe for winning export pipelines. Its direct experimental results are more warning than solution. The lasting lesson is that architectural novelty does not excuse us from byte-aware quantization analysis.

Best synthesis links

Extends AWQ and PTQ1.61 into the state-space setting.
Provides the compression-side counterpart to Transformers are SSMs.
Strengthens quantization and outlier handling by showing that the outlier story is architecture-agnostic enough to survive the transformer boundary.

Parameter Golf translation

If we explore SSM or recurrent candidates, we should ask immediately:

where are the activation outliers,
how hardware-friendly is any outlier mitigation,
and whether the resulting compression path is actually better than a more conventional transformer baseline.

Pierro, A., & Abreu, S. (2024). Mamba-PTQ: Outlier Channels in Recurrent Large Language Models. arXiv Preprint arXiv:2407.12397. https://arxiv.org/abs/2407.12397

Parameter Golf Research Garden

Section Tree

Mamba-PTQ

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Mamba-PTQ

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes