2 items with this tag.
papers
Paper note on the language-model head as an optimization bottleneck, not only a storage bottleneck.
papers
Paper note on compressing the language-model output bottleneck by replacing the full logits projection with a compact codebook.