7 items with this tag.
papers
Paper note on using reinforcement learning during training to decide which transformer layers should share weights and which should remain independent.
frontiers
Frontier synthesis on why recursive models may only become compelling once normalization and tiny phase-specific adaptation are treated as part of the compression interface.
notes
Synthesis note on why shared-depth transformer designs are attractive under a hard artifact budget, and where they usually break.
notes
Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.
papers
Paper note on making Universal Transformers competitive through parameter sharing plus sparse expert capacity.
papers
Paper note on converting pretrained transformers into recursive/shared-parameter models with lightweight depth-specific relaxation.
papers
Paper note on recurrent self-attentive depth, dynamic halting, and the idea that transformers can trade stored depth for repeated computation.