Tag: parameter-sharing

Mar 19, 2026

moonshots

Symmetry-Transport Weights

Moonshot hypothesis that many apparently different tensors could be stored as one canonical prototype plus cheap transport maps instead of as separate weights.

Mar 19, 2026

papers

Dynamic Layer Tying

Paper note on using reinforcement learning during training to decide which transformer layers should share weights and which should remain independent.

Mar 19, 2026

hypotheses

Phase-Conditioned Sharing

Hypothesis that tiny per-depth conditioning can recover much of the specialization lost by strict parameter sharing.

Mar 19, 2026

hypotheses

Recursive Width Scaling

Hypothesis that storing fewer unique layers and spending the savings on width or lightweight per-layer adaptation is a better artifact trade than many fully unique blocks.

Mar 19, 2026

ideas

Head-to-Depth Budget Swap

Hypothesis that shrinking tokenizer and LM-head burden, then reinvesting the saved bytes into a wider shared backbone, beats spending the same budget on a larger static head.

Mar 19, 2026

lanes

Recursive and Shared-Parameter Architectures

Why parameter sharing may be the cleanest way to buy width, extra compute, or light specialization under a hard artifact cap.

Mar 19, 2026

notes

Recursive Layer Sharing

Synthesis note on why shared-depth transformer designs are attractive under a hard artifact budget, and where they usually break.

Mar 19, 2026

papers

ALBERT

Paper note on cross-layer parameter sharing and factorized embeddings as two clean ways to reduce stored parameters without simply shrinking hidden capacity.

Mar 19, 2026

papers

Fine-grained Parameter Sharing

Paper note on learning structured parameter sharing with tensor decompositions and sparsity instead of treating sharing as all-or-nothing layer tying.

Mar 19, 2026

papers

Relaxed Recursive Transformers

Paper note on converting pretrained transformers into recursive/shared-parameter models with lightweight depth-specific relaxation.

Parameter Golf Research Garden

Section Tree

Tag: parameter-sharing

Symmetry-Transport Weights

Dynamic Layer Tying

Phase-Conditioned Sharing

Recursive Width Scaling

Head-to-Depth Budget Swap

Recursive and Shared-Parameter Architectures

Recursive Layer Sharing

ALBERT

Fine-grained Parameter Sharing

Relaxed Recursive Transformers

Graph View