(Zhao et al., 2025)

Sources: arXiv:2502.13179 · alphaXiv overview

Core contribution

PTQ1.61 tries to make true sub-2-bit post-training quantization practical by preserving salient structure with less overhead than earlier mixed-precision or exception-heavy methods. The key message is that post-training methods are not exempt from the outlier problem; they simply solve it with different constraints.

Why this matters for Parameter Golf

This paper matters because it shows that the same structural story appears even when training is held fixed. That is useful for the knowledge garden: it means the case for selective preservation is not just a quantization-aware-training artifact. Even pure export pipelines live or die by how they isolate what the cheap path cannot carry.

What to import

  • Sub-2-bit PTQ is possible only when saliency is handled intelligently.
  • Metadata overhead is a first-class metric.
  • Preprocessing and representation choices can matter as much as the nominal bit-width.

What not to over-import

PTQ1.61 does not prove that any specific saliency heuristic will survive this challenge’s exact workload. It also does not eliminate the risk that protecting some structure helps a proxy benchmark more than the true target. The transferable lesson is that salient preservation must be byte-aware from the start.

  • Supports sparse outlier preservation from the PTQ side rather than the training side.
  • Complements MicroScopiQ on the systems implications of preserving structure.
  • Sits between pQuant and QuaRot as a middle ground: protect saliency without necessarily rotating the whole representation or redesigning training.

Parameter Golf translation

PTQ1.61 suggests evaluating candidate export formats by asking:

  • can the salient subset be identified cheaply?
  • how much bookkeeping does protection require?
  • is the resulting format still simpler and more size-efficient than widening the baseline model slightly?
Zhao, J., Zhang, M., Wang, M., Shang, Y., Zhang, K., Guan, W., Wang, Y., & Zhang, M. (2025). PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models. arXiv Preprint arXiv:2502.13179. https://arxiv.org/abs/2502.13179