(Dolatabadi et al., 2026)

Sources: arXiv:2603.03597 · alphaXiv overview

Core contribution

NuMuon starts from the observation that Muon-trained models already show an implicit low-rank bias, then makes that bias explicit by constraining update directions with a nuclear-norm budget. The result is an optimizer meant to train models whose weights are more compressible before any downstream low-rank compression pipeline is applied.

Why this matters for Parameter Golf

This is a strong bleeding-edge example of a broader thesis in the garden: the optimizer itself can shape the final artifact. If low-rank structure is one of the cheapest ways to reduce stored bytes, then the training algorithm should help manufacture that structure rather than leaving it to post-hoc surgery.

What to import

  • Compressibility can be an optimizer property.
  • Low-rank friendliness can be induced during training, not merely extracted later.
  • Update-shape constraints may matter as much as architecture when the target is final artifact size.

What not to over-import

NuMuon is specialized around low-rank structure and its current validation is tied to SVD-style downstream compression. It does not prove that all useful compact artifacts should be low-rank. The durable lesson is that optimizer design can target future artifact structure.

  • Strongly supports Artifact-native training.
  • Complements BackSlash: one shapes compressibility through rate-style pressure, the other through optimizer geometry.
  • Connects to ReALLM and LittleBit by reinforcing the idea that structured weights are easier to compress than arbitrary dense ones.

Parameter Golf translation

A local translation is to treat optimizer and training schedule choices as part of the compression stack. If a training rule consistently yields weights that admit smaller low-rank residuals or cleaner shared bases, that may be more valuable than a slightly better floating-point checkpoint.

Dolatabadi, H. M., Ajanthan, T., Ramasinghe, S., Hewa Koneputugodage, C. P., Siriwardhana, S., Shevchenko, V., Pajak, K., Snewin, J., Avraham, G., & Long, A. (2026). NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training. arXiv Preprint arXiv:2603.03597. https://arxiv.org/abs/2603.03597