Sources: arXiv:2504.16968 · alphaXiv overview
Core contribution
BackSlash brings rate-distortion optimization into training itself. Instead of training a model normally and then compressing it, it adds a rate-constrained objective during optimization so the learned weights are already biased toward lower description length.
Why this matters for Parameter Golf
This is one of the strongest paper-level validations of the artifact-native instinct in this garden. It says compressibility is not just an export concern; training can directly shape the weight distribution so the final object is smaller and more robust to later compression.
What to import
- Compression can be a training objective, not just a finishing step.
- The parameter distribution matters. BackSlash explicitly models weight statistics instead of pretending all weight sources are simple Gaussians.
- Compression-aware training can improve pruning robustness too. That matters because models that are easier to simplify structurally are often more artifact-friendly overall.
What not to over-import
BackSlash works with rate-style regularization, not the full complexity of a challenge artifact that includes every downstream packing choice. Its rate proxy is suggestive, not identical to real submission bytes.
Best synthesis links
- Gives paper support to Artifact-native training.
- Extends Radio from allocation at compression time to pressure during training.
- Pairs naturally with NuMuon, which also tries to make training produce more compressible weights rather than merely compressing after the fact.
Parameter Golf translation
The practical import is to explore finishing objectives that reward:
- low-entropy residuals
- fewer exception families
- compressible parameter distributions
- structures that survive roundtrip well
rather than only optimizing floating-point checkpoint quality.
Related
- Artifact-native training
- Radio
- NuMuon
- Rate-distortion for artifact caps
- Training economics and small-model bottlenecks