Sources: arXiv:2407.12397 · alphaXiv overview
Core contribution
Mamba-PTQ shows that Mamba-style recurrent LLMs still exhibit activation outlier channels and that naive post-training quantization degrades sharply when those outliers are ignored. The important result is not that Mamba quantization is “solved,” but that recurrent/state-space models do not escape the outlier problem by changing the sequence mixer.
Why this matters for Parameter Golf
This paper closes an easy loophole in compact-model intuition. It is tempting to think that if transformers quantize poorly, a recurrent or state-space alternative might be naturally cleaner. Mamba-PTQ says that is too optimistic: if we widen the architectural search into SSMs, we inherit another version of the same saliency and outlier problem.
What to import
- Outlier handling remains central even in recurrent/state-space LMs.
- Activation outliers, not only weight distributions, are the key quantization obstacle.
- Alternative sequence mixers should be judged jointly with their compression path, not only with perplexity or runtime.
What not to over-import
This is preliminary quantization work and not a mature recipe for winning export pipelines. Its direct experimental results are more warning than solution. The lasting lesson is that architectural novelty does not excuse us from byte-aware quantization analysis.
Best synthesis links
- Extends AWQ and PTQ1.61 into the state-space setting.
- Provides the compression-side counterpart to Transformers are SSMs.
- Strengthens quantization and outlier handling by showing that the outlier story is architecture-agnostic enough to survive the transformer boundary.
Parameter Golf translation
If we explore SSM or recurrent candidates, we should ask immediately:
- where are the activation outliers,
- how hardware-friendly is any outlier mitigation,
- and whether the resulting compression path is actually better than a more conventional transformer baseline.