Hypothesis

A compact recurrent model may outperform a larger static model if it can spend a small, bounded amount of extra evaluation-time compute on refinement, planning, or reranking.

Why this is plausible

Under a hard artifact cap, stored parameters and evaluation-time compute are partly substitutable resources. A model that already reuses shared blocks across depth is especially well positioned to exploit this exchange.

This makes iterative refinement a natural extension of:

Candidate forms

  • extra recurrent passes on difficult examples
  • shallow self-refinement loops before final prediction
  • generate-then-rerank behavior inside the same compact model family
  • planning-style intermediate computation rather than a single forward path

What would support it

  • a smaller artifact overtaking a larger static baseline once limited extra compute is allowed
  • recurrent architectures benefiting more than non-recurrent ones from extra inference steps
  • the quality gain per extra inference step staying favorable for at least a short range

Main risks

  • wall-clock costs overwhelm the quality gain
  • improvements come from simple reranking tricks that do not generalize
  • the underlying model is too weak for refinement to rescue it