Tag: architecture

Mar 20, 2026

papers

Transformers are SSMs

Paper note on structured state space duality and why transformer intuition can transfer into linear-time sequence models.

Mar 19, 2026

moonshots

Role-State Recurrence

Moonshot hypothesis that repeated depth should specialize through a persistent internal role state rather than through stored layer-specific parameters.

Mar 19, 2026

hypotheses

Phase-Conditioned Sharing

Hypothesis that tiny per-depth conditioning can recover much of the specialization lost by strict parameter sharing.

Mar 19, 2026

hypotheses

Recurrent Wide Architecture

Concrete architecture hypothesis: use aggressive depth sharing to buy much more width, then spend leftover bytes on stability and selective precision.

Mar 19, 2026

hypotheses

Recursive Width Scaling

Hypothesis that storing fewer unique layers and spending the savings on width or lightweight per-layer adaptation is a better artifact trade than many fully unique blocks.

Mar 19, 2026

hypotheses

Unified Compression-Aware Architecture

Synthesis hypothesis that the strongest compact artifacts will combine shared depth, activation discipline, selective precision, and cheap specialization rather than relying on one trick alone.

Mar 19, 2026

lanes

Recursive and Shared-Parameter Architectures

Why parameter sharing may be the cleanest way to buy width, extra compute, or light specialization under a hard artifact cap.

Mar 19, 2026

notes

Compute-for-Storage Exchange

Synthesis note on the recurring compact-model idea that repeated computation can substitute for stored parameters.

Mar 19, 2026

notes

Shared Depth Needs Cheap Specialization

Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.

Mar 19, 2026

experiments

RWA Breadth Experiment

A breadth-profile local test of the recurrent-wide-architecture idea: aggressive depth sharing plus width expansion under the artifact cap.

Parameter Golf Research Garden

Section Tree

Tag: architecture

Transformers are SSMs

Role-State Recurrence

Phase-Conditioned Sharing

Recurrent Wide Architecture

Recursive Width Scaling

Unified Compression-Aware Architecture

Recursive and Shared-Parameter Architectures

Compute-for-Storage Exchange

Shared Depth Needs Cheap Specialization

RWA Breadth Experiment

Graph View