Artifact-Native Training

Moonshot

Train the model against the shape of the final compressed artifact, not just against task loss plus fake quantization.

The central bet is that many current methods still optimize the wrong object:

train a checkpoint
maybe inject quantization noise
compress it later

Artifact-native training says the thing we should optimize is closer to:

a model whose weights naturally collapse into low-entropy, reusable, codec-friendly structure after the exact export path we care about.

Why this is outside the current prior

Even strong quantization-aware work usually stops at the quantizer boundary. It does not explicitly ask whether the resulting object is easy to serialize as a compact self-contained artifact.

This moonshot changes the target from:

good low-bit checkpoint

to:

good final compressed object

Mechanism sketch

Add cheap differentiable or semi-differentiable proxies during training for things like:

residual entropy after coarse quantization
number of distinct exception families
codebook reuse rate
compressibility of protected subsets
metadata complexity of the exception path

Then regularize the model toward structures with better byte return after roundtrip.

Why it might matter for Parameter Golf

Parameter Golf is already an artifact-first challenge. If the winning object is evaluated only after the real export path, then checkpoint quality is an intermediate variable, not the target.

This moonshot says we should make the model want to become a compact artifact.

Cheapest falsifier

define one or two export-complexity proxies
train with a weak regularizer toward them
check whether proxy gains produce better post-roundtrip val_bpb

Kill it if the proxies improve but the real artifact metric does not.

What would make it real

post-roundtrip quality improves at fixed bytes
the model develops more repeated, structured exceptions rather than random fragile ones
gains survive final packing rather than appearing only in pre-export analysis

Parameter Golf Research Garden

Section Tree

Artifact-Native Training

Moonshot

Why this is outside the current prior

Mechanism sketch

Why it might matter for Parameter Golf

Cheapest falsifier

What would make it real

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Artifact-Native Training

Moonshot

Why this is outside the current prior

Mechanism sketch

Why it might matter for Parameter Golf

Cheapest falsifier

What would make it real

Related

Graph View

Table of Contents

Referenced by

Recent notes