5 items with this tag.
ideas
Hypothesis that most head-side quantization damage is concentrated in a tiny set of difficult token rows, making row-level protection a better byte trade than uniform head precision.
ideas
Hypothesis that one small learned codebook bank shared across repeated blocks can beat per-matrix quantization by amortizing metadata and aligning compression with shared-depth structure.
ideas
Hypothesis that shrinking tokenizer and LM-head burden, then reinvesting the saved bytes into a wider shared backbone, beats spending the same budget on a larger static head.
ideas
Hypothesis that shared-depth models can recover most layer-role specialization using only per-step RMSNorm and tiny channel gates, with almost no byte cost.
ideas
Hypothesis that a compact shared-depth model should spend extra inference-time passes only on uncertain positions, turning compute into quality more efficiently than storing more static depth.