(Panferov et al., 2025)

Sources: arXiv:2502.05003 · alphaXiv overview

Core contribution

QuEST tackles one of the hardest versions of low-bit training: 1-bit weights and activations. Its central message is that success requires improving both sides of the learning problem:

  • the forward pass must fit low-bit distributions more faithfully
  • the backward pass must reduce gradient bias and instability

That makes the paper a strong reminder that aggressive compression is not only a storage problem but a training-dynamics problem.

Why this matters for Parameter Golf

Parameter Golf is judged on post-roundtrip quality, so any intervention that reduces the mismatch between train-time behavior and compressed behavior is unusually important. QuEST is valuable because it frames low-bit failure mechanistically: if the training dynamics are misaligned with the final representation, export tricks alone may hit a ceiling.

What to import

  • Forward approximation quality matters.
  • Backward bias matters separately.
  • Very low-bit success requires co-design of optimization and representation.

What not to over-import

The paper does not imply that every research loop should chase full 1-bit training. The literal regime may be too aggressive for current local constraints. The transferable lesson is that training-side fixes deserve first-class attention whenever export-side methods seem to plateau.

  • Complements Extra RMSNorm: one paper stabilizes the signal path architecturally, the other stabilizes training dynamics algorithmically.
  • Sits near BitNet b1.58 as another argument that ultra-low-bit modeling wants native recipes.
  • Helps interpret quantization and outlier handling as a training problem, not just a codec problem.

Parameter Golf translation

QuEST suggests asking, before inventing more elaborate export formats:

  • are the training dynamics already aligned with the compressed representation?
  • is the forward quantization model too crude for the regime?
  • are observed gains coming from genuinely better low-bit fit or from brittle proxy effects?
Panferov, A., Chen, J., Tabesh, S., Castro, R. L., Nikdan, M., & Alistarh, D. (2025). QuEST: Stable Training of LLMs with 1-Bit Weights and Activations. arXiv Preprint arXiv:2502.05003. https://arxiv.org/abs/2502.05003