Sources: arXiv:2505.03031 · alphaXiv overview
Core contribution
Radio reframes LLM quantization as an explicit rate-distortion optimization problem. Instead of picking one quantizer and then arguing about average bit-width, it asks how to allocate scarce bits so that the marginal distortion per extra bit is balanced across model components.
Why this matters for Parameter Golf
This is one of the cleanest papers for a hard-cap setting because Parameter Golf is already a rate-distortion problem in disguise. The artifact cap means the real question is not just “which quantizer is best?” but “which extra stored bytes recover the most post-roundtrip language-model quality?”
What to import
- Bit allocation should be explicit. A byte spent on one tensor is a byte not spent elsewhere.
- Average bit-width is too blunt. Two methods with the same nominal bits can have very different byte ROI once scales, grouping, and metadata are counted.
- Grouping overhead is part of the objective. Small groups may improve distortion but can lose once extra side information is counted.
What not to over-import
Radio is still a quantization paper, not a full artifact-cap optimizer for every downstream codec. Its distortion proxy is useful, but it does not prove that the exact same ranking remains optimal after all packing, serialization, and challenge-specific execution constraints are applied.
Best synthesis links
- Deepens Byte allocation beats average bit-width by giving that frontier an information-theoretic frame.
- Pairs naturally with OWQ, which gives a concrete way to protect the most sensitive columns instead of distributing bits democratically.
- Connects to ReALLM, where the true object being budgeted is no longer only weight precision but also residual structure and decoder capacity.
Parameter Golf translation
The practical lesson is to rank candidate exception paths by recovered quality per stored byte:
- protected rows or columns
- codebook size
- group size
- residual low-rank terms
- any structured side channel
That framing is more likely to produce leaderboard-relevant decisions than comparing methods only by advertised low-bit precision.
Related
- Byte allocation beats average bit-width
- Entropy-friendly model structure
- OWQ
- pQuant
- ReALLM
- Quantization and outliers