Tag: normalization

Mar 19, 2026

frontiers

Compression Interfaces for Shared Depth

Frontier synthesis on why recursive models may only become compelling once normalization and tiny phase-specific adaptation are treated as part of the compression interface.

Mar 19, 2026

hypotheses

RMSNorm Stabilized Scaling

Hypothesis that extra RMSNorm before projections improves post-roundtrip quality by stabilizing low-bit training and export.

Mar 19, 2026

ideas

Norm-Only Phase Specialization

Hypothesis that shared-depth models can recover most layer-role specialization using only per-step RMSNorm and tiny channel gates, with almost no byte cost.

Mar 19, 2026

notes

Normalization Before Projections

Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.

Mar 19, 2026

papers

An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

Paper note on the claim that an extra RMSNorm before linear projections is a disproportionately strong stabilizer for extreme low-bit finetuning.

Parameter Golf Research Garden

Section Tree

Tag: normalization

Compression Interfaces for Shared Depth

RMSNorm Stabilized Scaling

Norm-Only Phase Specialization

Normalization Before Projections

An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

Graph View