5 items with this tag.
frontiers
Frontier synthesis on why recursive models may only become compelling once normalization and tiny phase-specific adaptation are treated as part of the compression interface.
hypotheses
Hypothesis that extra RMSNorm before projections improves post-roundtrip quality by stabilizing low-bit training and export.
ideas
Hypothesis that shared-depth models can recover most layer-role specialization using only per-step RMSNorm and tiny channel gates, with almost no byte cost.
notes
Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.
papers
Paper note on the claim that an extra RMSNorm before linear projections is a disproportionately strong stabilizer for extreme low-bit finetuning.