4 items with this tag.
papers
Paper note on preserving a tiny set of outlier-sensitive weight columns in high precision while quantizing the rest of the model aggressively.
ideas
Hypothesis that most head-side quantization damage is concentrated in a tiny set of difficult token rows, making row-level protection a better byte trade than uniform head precision.
papers
Paper note on decoupled low-bit training with a tiny high-precision branch for the parameters that matter most.
notes
Synthesis note on the recurring idea that a small subset of sensitive parameters deserves better precision than the rest.