QuEST

(Panferov et al., 2025)

Sources: arXiv:2502.05003 · alphaXiv overview

Core contribution

QuEST tackles one of the hardest versions of low-bit training: 1-bit weights and activations. Its central message is that success requires improving both sides of the learning problem:

the forward pass must fit low-bit distributions more faithfully
the backward pass must reduce gradient bias and instability

That makes the paper a strong reminder that aggressive compression is not only a storage problem but a training-dynamics problem.

Why this matters for Parameter Golf

Parameter Golf is judged on post-roundtrip quality, so any intervention that reduces the mismatch between train-time behavior and compressed behavior is unusually important. QuEST is valuable because it frames low-bit failure mechanistically: if the training dynamics are misaligned with the final representation, export tricks alone may hit a ceiling.

What to import

Forward approximation quality matters.
Backward bias matters separately.
Very low-bit success requires co-design of optimization and representation.

What not to over-import

The paper does not imply that every research loop should chase full 1-bit training. The literal regime may be too aggressive for current local constraints. The transferable lesson is that training-side fixes deserve first-class attention whenever export-side methods seem to plateau.

Best synthesis links

Complements Extra RMSNorm: one paper stabilizes the signal path architecturally, the other stabilizes training dynamics algorithmically.
Sits near BitNet b1.58 as another argument that ultra-low-bit modeling wants native recipes.
Helps interpret quantization and outlier handling as a training problem, not just a codec problem.

Parameter Golf translation

QuEST suggests asking, before inventing more elaborate export formats:

are the training dynamics already aligned with the compressed representation?
is the forward quantization model too crude for the regime?
are observed gains coming from genuinely better low-bit fit or from brittle proxy effects?

Panferov, A., Chen, J., Tabesh, S., Castro, R. L., Nikdan, M., & Alistarh, D. (2025). QuEST: Stable Training of LLMs with 1-Bit Weights and Activations. arXiv Preprint arXiv:2502.05003. https://arxiv.org/abs/2502.05003

Parameter Golf Research Garden

Section Tree

QuEST

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

QuEST

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes