Fine-grained Parameter Sharing

(Üyük et al., 2024)

Sources: arXiv:2411.09816 · alphaXiv overview

Core contribution

FiPS argues that parameter sharing should be learned at a finer grain than whole-layer tying. By combining tensor decomposition and sparsity, it tries to preserve useful shared structure while still letting different parts of the network specialize.

Why this matters for Parameter Golf

This paper is valuable less as a literal recipe and more as a search-space expansion. It says the real choice is not just:

full independent layers, or
one shared recurrent block

There is a richer middle ground where the model shares bases, factors, or subspaces. That is exactly the kind of middle ground that could make recursive width scaling more robust under a hard artifact cap.

What to import

Sharing should be structured, not binary.
Sparsity is a natural companion to reuse. Shared structure and selective specialization can coexist.
Nearby functions may want nearby factors. Grouped or local sharing may be easier to exploit than arbitrary global tying.

What not to over-import

Tensor decompositions can introduce implementation and metadata complexity, and success in a paper setting does not mean the same factorization is the best use of bytes in a submission artifact. The main transferable lesson is to stop equating compression with strict tying.

Best synthesis links

Extends recursive layer sharing from a binary story to a spectrum of partial reuse.
Fits naturally between ALBERT and Relaxed Recursive Transformers: both share aggressively, but FiPS offers a more granular design language.
Supports recurrent wide architecture by suggesting that not every depth-specific adjustment has to be a full layer.

Parameter Golf translation

FiPS suggests exploring designs like:

shared MLP bases with lightweight depth-specific factors
grouped sharing where neighboring steps share more than distant steps
partial tying that spends a small amount of bytes on specialization rather than on fully separate blocks

Üyük, C., Lasby, M., Yassin, M., Evci, U., & Ioannou, Y. (2024). Learning Parameter Sharing with Tensor Decompositions and Sparsity. arXiv Preprint arXiv:2411.09816. https://arxiv.org/abs/2411.09816

Parameter Golf Research Garden

Section Tree

Fine-grained Parameter Sharing

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Fine-grained Parameter Sharing

Core contribution

Why this matters for Parameter Golf

What to import

What not to over-import

Best synthesis links

Parameter Golf translation

Related

Graph View

Table of Contents

Referenced by

Recent notes