Tag: training

Mar 20, 2026

papers

The LM Head is a Gradient Bottleneck

Paper note on the language-model head as an optimization bottleneck, not only a storage bottleneck.

Mar 19, 2026

frontiers

Learned Weight Codecs and Compressible Training

Frontier synthesis on the newest shift from handcrafted post-training formats toward training rules and learned representations that directly target compressible model weights.

Mar 19, 2026

papers

BackSlash

Paper note on integrating rate-constrained compression pressure directly into LLM training rather than treating compression only as a post-training step.

Mar 19, 2026

papers

NuMuon

Paper note on making LLM training explicitly produce more low-rank, compressible weights by constraining Muon updates with a nuclear-norm budget.

Mar 19, 2026

moonshots

Artifact Dropout

Moonshot hypothesis that models should be trained to survive multiple distinct artifact failure modes rather than a single clean quantization path.

Mar 19, 2026

moonshots

Artifact-Native Training

Moonshot hypothesis that the model should be trained directly to become a good compressed artifact, not merely a good floating-point checkpoint.

Mar 19, 2026

papers

Dynamic Layer Tying

Paper note on using reinforcement learning during training to decide which transformer layers should share weights and which should remain independent.

Mar 19, 2026

lanes

Training Economics and Small-Model Bottlenecks

The lane for understanding what actually dominates cost and learning dynamics when training compact language models.

Mar 19, 2026

papers

QuEST

Paper note on stabilizing 1-bit weight-and-activation training through better low-bit distribution fitting and more trustworthy gradients.

Mar 19, 2026

papers

Computational Bottlenecks of Training SLMs

Paper note on why compact-model training has a different systems bottleneck profile than many big-model intuitions suggest.

Parameter Golf Research Garden

Section Tree

Tag: training

The LM Head is a Gradient Bottleneck

Learned Weight Codecs and Compressible Training

BackSlash

NuMuon

Artifact Dropout

Artifact-Native Training

Dynamic Layer Tying

Training Economics and Small-Model Bottlenecks

QuEST

Computational Bottlenecks of Training SLMs

Graph View