Moonshot

Train the model against the shape of the final compressed artifact, not just against task loss plus fake quantization.

The central bet is that many current methods still optimize the wrong object:

  • train a checkpoint
  • maybe inject quantization noise
  • compress it later

Artifact-native training says the thing we should optimize is closer to:

a model whose weights naturally collapse into low-entropy, reusable, codec-friendly structure after the exact export path we care about.

Why this is outside the current prior

Even strong quantization-aware work usually stops at the quantizer boundary. It does not explicitly ask whether the resulting object is easy to serialize as a compact self-contained artifact.

This moonshot changes the target from:

  • good low-bit checkpoint

to:

  • good final compressed object

Mechanism sketch

Add cheap differentiable or semi-differentiable proxies during training for things like:

  • residual entropy after coarse quantization
  • number of distinct exception families
  • codebook reuse rate
  • compressibility of protected subsets
  • metadata complexity of the exception path

Then regularize the model toward structures with better byte return after roundtrip.

Why it might matter for Parameter Golf

Parameter Golf is already an artifact-first challenge. If the winning object is evaluated only after the real export path, then checkpoint quality is an intermediate variable, not the target.

This moonshot says we should make the model want to become a compact artifact.

Cheapest falsifier

  • define one or two export-complexity proxies
  • train with a weak regularizer toward them
  • check whether proxy gains produce better post-roundtrip val_bpb

Kill it if the proxies improve but the real artifact metric does not.

What would make it real

  • post-roundtrip quality improves at fixed bytes
  • the model develops more repeated, structured exceptions rather than random fragile ones
  • gains survive final packing rather than appearing only in pre-export analysis