ReALLM

(Leconte et al., 2024)

Sources: arXiv:2405.13155 · alphaXiv overview

Core contribution

ReALLM treats pretrained matrices as something richer than raw tensors to scalar-quantize. It decomposes each matrix into a high-precision residual path plus a compressed latent representation decoded by a small shared neural decoder, with matrix-specific choices about latent shape, bit budget, and residual structure.

Why this matters for Parameter Golf

This is interesting less as a direct drop-in method and more as a challenge to the default representation. It argues that some tensors are compressible because of their spatial or structural regularity, and that a shared decoder plus tiny latent codes can sometimes beat treating every matrix as an unstructured bag of numbers.

What to import

Representation format is a design variable. The model artifact does not have to be “just quantized matrices.”
Residual paths matter. A compact main representation plus a small high-fidelity correction can be a better compromise than globally raising precision.
Different matrices deserve different schemes. Early blocks and deeper blocks may want different compression interfaces.

What not to over-import

ReALLM is operationally heavy and tuned for pretrained-weight compression, not a tiny local search loop. It also pays for decoder machinery, so the apparent nominal bit gains only matter if the shared decoder and side information remain cheap enough in the final artifact.

Best synthesis links

Extends Additive Quantization and ClusComp from “better codes” toward “better representations.”
Supports Entropy-friendly model structure by showing that matrix regularity can be worth modeling directly.
Suggests a richer form of decoupled precision where the protected path is not only fp16 passthrough but a tiny residual model.

Parameter Golf translation

The strongest import is not “train an auto-decoder right now.” It is:

look for tensors with repeated local structure
use one cheap shared reconstruction mechanism where possible
reserve high-fidelity bytes for the residual that the shared structure cannot explain

That is especially interesting for tied or repeated architectures, where one reconstruction mechanism may amortize across many reused blocks.

Leconte, L., Bedin, L., Nguyen, V. M., & Moulines, E. (2024). ReALLM: A General Framework for LLM Compression and Fine-Tuning. arXiv Preprint arXiv:2405.13155. https://arxiv.org/abs/2405.13155

Parameter Golf Research Garden

Section Tree

ReALLM

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

ReALLM

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes