5 items with this tag.
lanes
When extra evaluation-time compute may dominate storing more parameters.
lanes
The lane focused on reducing the gap between train-time weights and the final compressed artifact.
lanes
Why parameter sharing may be the cleanest way to buy width, extra compute, or light specialization under a hard artifact cap.
lanes
Tokenization is part of the budget story, not just a preprocessing detail.
lanes
The lane for understanding what actually dominates cost and learning dynamics when training compact language models.