Core idea

A hard artifact cap makes stored parameters unusually precious. That pushes compact-model design toward mechanisms that can substitute extra computation for extra stored weights.

Where the exchange appears

This same pattern shows up in several places:

Why it is attractive

If the model can reuse a strong core computation, then saved bytes can be redirected into:

  • width
  • selective precision
  • a smaller number of more important special-case parameters
  • a better tokenizer or output path

The exchange is therefore not only architectural. It reaches into quantization and vocabulary decisions too.

Why it often fails

Compute is not free. This exchange breaks when:

  • repeated passes do not add genuinely new capability
  • a shared block lacks enough cheap specialization
  • extra compute creates wall-clock costs that overwhelm the benefit
  • aggressive recurrence worsens activation or outlier problems more than compression can recover

Practical takeaway

The right framing is not “reuse is always better.” It is:

When are a few strong reusable transformations better than many weak stored ones?

That question connects recursive sharing, evaluation-time compute, and compression-aware robustness.