Sources: arXiv:2410.01309 · alphaXiv overview
Core contribution
This paper shows that some model descriptions waste bits by redundantly encoding weight-space choices that are equivalent up to symmetry. By applying bits-back coding to rotational symmetries exposed by SliceGPT-style preprocessing, it recovers a few percent of model size essentially “for free.”
Why this matters for Parameter Golf
This is unusually valuable because it changes the question from “how do we distort the model less?” to “which bits are redundant even before distortion starts?” For a hard artifact cap, a free 3–5% size reduction can be a serious margin.
What to import
- Symmetry is a storage opportunity.
- Some bytes are redundant because the model has multiple equivalent descriptions.
- Training-free savings matter when margins are thin.
What not to over-import
The demonstrated gains are modest, and the method depends on a specific symmetry-exposing setup. It is not proof that all compact artifacts have huge hidden symmetry reserves. The lasting lesson is the storage mindset: some bytes can be recovered without changing the model’s function at all.
Best synthesis links
- Gives concrete support to Symmetry-transport weights.
- Extends Rate-distortion for artifact caps with a “zero-distortion savings” angle.
- Suggests an interesting complement to quantization: remove redundant description bits before fighting over distortion bits.
Parameter Golf translation
This suggests a practical new question:
- what parts of our artifact are genuinely informative,
- and what parts are only one arbitrary coordinate choice among equivalent ones?
That question may be especially powerful in shared or transformed architectures.