BackSlash

(Wu et al., 2025)

Sources: arXiv:2504.16968 · alphaXiv overview

Core contribution

BackSlash brings rate-distortion optimization into training itself. Instead of training a model normally and then compressing it, it adds a rate-constrained objective during optimization so the learned weights are already biased toward lower description length.

Why this matters for Parameter Golf

This is one of the strongest paper-level validations of the artifact-native instinct in this garden. It says compressibility is not just an export concern; training can directly shape the weight distribution so the final object is smaller and more robust to later compression.

What to import

Compression can be a training objective, not just a finishing step.
The parameter distribution matters. BackSlash explicitly models weight statistics instead of pretending all weight sources are simple Gaussians.
Compression-aware training can improve pruning robustness too. That matters because models that are easier to simplify structurally are often more artifact-friendly overall.

What not to over-import

BackSlash works with rate-style regularization, not the full complexity of a challenge artifact that includes every downstream packing choice. Its rate proxy is suggestive, not identical to real submission bytes.

Best synthesis links

Gives paper support to Artifact-native training.
Extends Radio from allocation at compression time to pressure during training.
Pairs naturally with NuMuon, which also tries to make training produce more compressible weights rather than merely compressing after the fact.

Parameter Golf translation

The practical import is to explore finishing objectives that reward:

low-entropy residuals
fewer exception families
compressible parameter distributions
structures that survive roundtrip well

rather than only optimizing floating-point checkpoint quality.

Wu, J., Wen, J., & Han, Y. (2025). BackSlash: Rate Constrained Optimized Training of Large Language Models. arXiv Preprint arXiv:2504.16968. https://arxiv.org/abs/2504.16968

Parameter Golf Research Garden

Section Tree

BackSlash

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

BackSlash

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes