NuMuon

(Dolatabadi et al., 2026)

Sources: arXiv:2603.03597 · alphaXiv overview

Core contribution

NuMuon starts from the observation that Muon-trained models already show an implicit low-rank bias, then makes that bias explicit by constraining update directions with a nuclear-norm budget. The result is an optimizer meant to train models whose weights are more compressible before any downstream low-rank compression pipeline is applied.

Why this matters for Parameter Golf

This is a strong bleeding-edge example of a broader thesis in the garden: the optimizer itself can shape the final artifact. If low-rank structure is one of the cheapest ways to reduce stored bytes, then the training algorithm should help manufacture that structure rather than leaving it to post-hoc surgery.

What to import

Compressibility can be an optimizer property.
Low-rank friendliness can be induced during training, not merely extracted later.
Update-shape constraints may matter as much as architecture when the target is final artifact size.

What not to over-import

NuMuon is specialized around low-rank structure and its current validation is tied to SVD-style downstream compression. It does not prove that all useful compact artifacts should be low-rank. The durable lesson is that optimizer design can target future artifact structure.

Best synthesis links

Strongly supports Artifact-native training.
Complements BackSlash: one shapes compressibility through rate-style pressure, the other through optimizer geometry.
Connects to ReALLM and LittleBit by reinforcing the idea that structured weights are easier to compress than arbitrary dense ones.

Parameter Golf translation

A local translation is to treat optimizer and training schedule choices as part of the compression stack. If a training rule consistently yields weights that admit smaller low-rank residuals or cleaner shared bases, that may be more valuable than a slightly better floating-point checkpoint.

Dolatabadi, H. M., Ajanthan, T., Ramasinghe, S., Hewa Koneputugodage, C. P., Siriwardhana, S., Shevchenko, V., Pajak, K., Snewin, J., Avraham, G., & Long, A. (2026). NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training. arXiv Preprint arXiv:2603.03597. https://arxiv.org/abs/2603.03597

Parameter Golf Research Garden

Section Tree

NuMuon

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

NuMuon

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes