Tag: recursion

Mar 19, 2026

papers

Dynamic Layer Tying

Paper note on using reinforcement learning during training to decide which transformer layers should share weights and which should remain independent.

Mar 19, 2026

frontiers

Compression Interfaces for Shared Depth

Frontier synthesis on why recursive models may only become compelling once normalization and tiny phase-specific adaptation are treated as part of the compression interface.

Mar 19, 2026

notes

Recursive Layer Sharing

Synthesis note on why shared-depth transformer designs are attractive under a hard artifact budget, and where they usually break.

Mar 19, 2026

notes

Shared Depth Needs Cheap Specialization

Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.

Mar 19, 2026

papers

MoEUT

Paper note on making Universal Transformers competitive through parameter sharing plus sparse expert capacity.

Mar 19, 2026

papers

Relaxed Recursive Transformers

Paper note on converting pretrained transformers into recursive/shared-parameter models with lightweight depth-specific relaxation.

Mar 19, 2026

papers

Universal Transformers

Paper note on recurrent self-attentive depth, dynamic halting, and the idea that transformers can trade stored depth for repeated computation.

Parameter Golf Research Garden

Section Tree

Tag: recursion

Dynamic Layer Tying

Compression Interfaces for Shared Depth

Recursive Layer Sharing

Shared Depth Needs Cheap Specialization

MoEUT

Relaxed Recursive Transformers

Universal Transformers

Graph View