Iterative Refinement over Stored Depth

Hypothesis

A compact recurrent model may outperform a larger static model if it can spend a small, bounded amount of extra evaluation-time compute on refinement, planning, or reranking.

Why this is plausible

Under a hard artifact cap, stored parameters and evaluation-time compute are partly substitutable resources. A model that already reuses shared blocks across depth is especially well positioned to exploit this exchange.

This makes iterative refinement a natural extension of:

Candidate forms

extra recurrent passes on difficult examples
shallow self-refinement loops before final prediction
generate-then-rerank behavior inside the same compact model family
planning-style intermediate computation rather than a single forward path

What would support it

a smaller artifact overtaking a larger static baseline once limited extra compute is allowed
recurrent architectures benefiting more than non-recurrent ones from extra inference steps
the quality gain per extra inference step staying favorable for at least a short range

Main risks

wall-clock costs overwhelm the quality gain
improvements come from simple reranking tricks that do not generalize
the underlying model is too weak for refinement to rescue it

Parameter Golf Research Garden

Section Tree

Iterative Refinement over Stored Depth

Hypothesis

Why this is plausible

Candidate forms

What would support it

Main risks

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Iterative Refinement over Stored Depth

Hypothesis

Why this is plausible

Candidate forms

What would support it

Main risks

Related

Graph View

Table of Contents

Referenced by

Recent notes