9 items with this tag.
frontiers
Frontier synthesis on the newest shift from handcrafted post-training formats toward training rules and learned representations that directly target compressible model weights.
papers
Paper note on integrating rate-constrained compression pressure directly into LLM training rather than treating compression only as a post-training step.
papers
Paper note on making LLM training explicitly produce more low-rank, compressible weights by constraining Muon updates with a nuclear-norm budget.
moonshots
Moonshot hypothesis that models should be trained to survive multiple distinct artifact failure modes rather than a single clean quantization path.
moonshots
Moonshot hypothesis that the model should be trained directly to become a good compressed artifact, not merely a good floating-point checkpoint.
papers
Paper note on using reinforcement learning during training to decide which transformer layers should share weights and which should remain independent.
lanes
The lane for understanding what actually dominates cost and learning dynamics when training compact language models.
papers
Paper note on stabilizing 1-bit weight-and-activation training through better low-bit distribution fitting and more trustworthy gradients.
papers
Paper note on why compact-model training has a different systems bottleneck profile than many big-model intuitions suggest.