Role-State Recurrence

Moonshot

Let one shared block behave like many conceptual layers because it carries a tiny persistent role state across passes.

Instead of storing distinct layer parameters, store a tiny internal program state that says things like:

lexical pass
structural pass
consolidation pass
logits-sharpening pass

Why this is outside the current prior

Current shared-depth work usually keeps specialization in small stored adapters, LoRA slices, or norm scales. This moonshot moves specialization into dynamic state, not static parameter deltas.

That is a stronger departure from the normal transformer prior.

Mechanism sketch

one shared backbone block
one tiny recurrent role memory
role memory updates every pass
block reads token state + role state together
maybe only tiny norm/gating parameters are stored explicitly

Why it might matter for Parameter Golf

If specialization can live in state transitions instead of stored unique weights, then the artifact can stay tiny while effective depth behavior remains diverse.

This is especially attractive when evaluation-time compute is cheaper than permanently storing more unique layers.

Cheapest falsifier

inspect whether repeated passes actually separate into distinct role behavior
test whether role state collapses into one mode
compare against fixed pass-index conditioning

Kill it if role state adds complexity without producing distinct useful phases.

What would make it real

clear behavioral differentiation across passes
better post-roundtrip quality than static shared-depth at equal bytes
only tiny extra stored state machinery

Parameter Golf Research Garden

Section Tree

Role-State Recurrence

Moonshot

Why this is outside the current prior

Mechanism sketch

Why it might matter for Parameter Golf

Cheapest falsifier

What would make it real

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Role-State Recurrence

Moonshot

Why this is outside the current prior

Mechanism sketch

Why it might matter for Parameter Golf

Cheapest falsifier

What would make it real

Related

Graph View

Table of Contents

Referenced by

Recent notes