Moonshot
Train the model against the shape of the final compressed artifact, not just against task loss plus fake quantization.
The central bet is that many current methods still optimize the wrong object:
- train a checkpoint
- maybe inject quantization noise
- compress it later
Artifact-native training says the thing we should optimize is closer to:
a model whose weights naturally collapse into low-entropy, reusable, codec-friendly structure after the exact export path we care about.
Why this is outside the current prior
Even strong quantization-aware work usually stops at the quantizer boundary. It does not explicitly ask whether the resulting object is easy to serialize as a compact self-contained artifact.
This moonshot changes the target from:
- good low-bit checkpoint
to:
- good final compressed object
Mechanism sketch
Add cheap differentiable or semi-differentiable proxies during training for things like:
- residual entropy after coarse quantization
- number of distinct exception families
- codebook reuse rate
- compressibility of protected subsets
- metadata complexity of the exception path
Then regularize the model toward structures with better byte return after roundtrip.
Why it might matter for Parameter Golf
Parameter Golf is already an artifact-first challenge. If the winning object is evaluated only after the real export path, then checkpoint quality is an intermediate variable, not the target.
This moonshot says we should make the model want to become a compact artifact.
Cheapest falsifier
- define one or two export-complexity proxies
- train with a weak regularizer toward them
- check whether proxy gains produce better post-roundtrip
val_bpb
Kill it if the proxies improve but the real artifact metric does not.
What would make it real
- post-roundtrip quality improves at fixed bytes
- the model develops more repeated, structured exceptions rather than random fragile ones
- gains survive final packing rather than appearing only in pre-export analysis