(He et al., 2024)

Sources: arXiv:2410.01309 · alphaXiv overview

Core contribution

This paper shows that some model descriptions waste bits by redundantly encoding weight-space choices that are equivalent up to symmetry. By applying bits-back coding to rotational symmetries exposed by SliceGPT-style preprocessing, it recovers a few percent of model size essentially “for free.”

Why this matters for Parameter Golf

This is unusually valuable because it changes the question from “how do we distort the model less?” to “which bits are redundant even before distortion starts?” For a hard artifact cap, a free 3–5% size reduction can be a serious margin.

What to import

  • Symmetry is a storage opportunity.
  • Some bytes are redundant because the model has multiple equivalent descriptions.
  • Training-free savings matter when margins are thin.

What not to over-import

The demonstrated gains are modest, and the method depends on a specific symmetry-exposing setup. It is not proof that all compact artifacts have huge hidden symmetry reserves. The lasting lesson is the storage mindset: some bytes can be recovered without changing the model’s function at all.

Parameter Golf translation

This suggests a practical new question:

  • what parts of our artifact are genuinely informative,
  • and what parts are only one arbitrary coordinate choice among equivalent ones?

That question may be especially powerful in shared or transformed architectures.

He, J., Flamich, G., & Hernández-Lobato, J. M. (2024). Getting Free Bits Back from Rotational Symmetries in LLMs. arXiv Preprint arXiv:2410.01309. https://arxiv.org/abs/2410.01309