Compute-for-Storage Exchange

Core idea

A hard artifact cap makes stored parameters unusually precious. That pushes compact-model design toward mechanisms that can substitute extra computation for extra stored weights.

Where the exchange appears

This same pattern shows up in several places:

shared depth turns unique layers into repeated block application
phase-conditioned sharing keeps most weights shared while buying only tiny step-specific specialization
iterative refinement over stored depth spends extra evaluation-time passes instead of storing a larger static artifact

Why it is attractive

If the model can reuse a strong core computation, then saved bytes can be redirected into:

width
selective precision
a smaller number of more important special-case parameters
a better tokenizer or output path

The exchange is therefore not only architectural. It reaches into quantization and vocabulary decisions too.

Why it often fails

Compute is not free. This exchange breaks when:

repeated passes do not add genuinely new capability
a shared block lacks enough cheap specialization
extra compute creates wall-clock costs that overwhelm the benefit
aggressive recurrence worsens activation or outlier problems more than compression can recover

Practical takeaway

The right framing is not “reuse is always better.” It is:

When are a few strong reusable transformations better than many weak stored ones?

That question connects recursive sharing, evaluation-time compute, and compression-aware robustness.

Parameter Golf Research Garden

Section Tree

Compute-for-Storage Exchange

Core idea

Where the exchange appears

Why it is attractive

Why it often fails

Practical takeaway

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Compute-for-Storage Exchange

Core idea

Where the exchange appears

Why it is attractive

Why it often fails

Practical takeaway

Related

Graph View

Table of Contents

Referenced by

Recent notes