VQ-Logits

(Shao et al., 2025)

Sources: arXiv:2505.10202 · alphaXiv overview

Core contribution

VQ-Logits attacks the output bottleneck directly by replacing the full vocabulary-sized logits projection with a compact vector-quantized codebook. The key claim is that the output head can often be compressed more structurally than a plain low-rank or tied-embedding treatment suggests, because many vocabulary items can share a smaller predictive basis.

Why this matters for Parameter Golf

This is one of the clearest papers in the shelf for output-head compression. It makes the output side feel like a real artifact-budget lever rather than a theoretical annoyance. In a tiny model, saving head bytes can be as meaningful as shaving another fraction of a bit from the trunk.

What to import

The logits projection is a compressible structure, not just a fixed tax.
A compact codebook may buy better head tradeoffs than naive low-rank factorization alone.
Vocabulary semantics and output parameterization should be designed together.

What not to over-import

The paper does not prove that codebook-style output compression wins after all bookkeeping, mapping, and implementation overheads are counted inside a strict challenge artifact. The stable lesson is that the head deserves its own structured compression search, not that any specific VQ scheme is automatically byte-optimal.

Best synthesis links

Directly extends the LM head budget note.
Complements Vocabulary Compression for Low-Compute Environments by making the output bottleneck more explicit and more compressible.
Pairs with The LM Head is a Gradient Bottleneck, which argues that the head is also an optimization bottleneck, not just a storage one.

Parameter Golf translation

VQ-Logits motivates asking:

how many bytes are tied up in the head versus the trunk,
whether head restructuring can buy more than another round of backbone compression,
and whether tokenizer changes should be evaluated jointly with output-head compression schemes.

Shao, J., Huang, H., Wu, J., Cheng, Y., Wu, Z., Shan, Y., & Zheng, M. (2025). VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits. arXiv Preprint arXiv:2505.10202. https://arxiv.org/abs/2505.10202

Parameter Golf Research Garden

Section Tree

VQ-Logits

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

VQ-Logits

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes