(Lee et al., 2024)

Sources: arXiv:2306.02272 · alphaXiv overview

Core contribution

OWQ identifies a small set of weak columns: weight columns whose quantization error matters disproportionately because they are tied to activation outliers. It keeps those columns in higher precision and lets the rest of the dense matrix fall to very low bit-width, then builds fine-tuning around the same protected subset.

Why this matters for Parameter Golf

This is one of the most directly challenge-native papers in the graph. It says the right answer may be not “protect some whole tensors,” but “protect a tiny structured slice whose byte cost is small and whose damage if quantized is large.”

What to import

  • Outliers induce structure. The most important exceptions may cluster by column rather than looking like random isolated weights.
  • Tiny protected subsets can dominate quality. You do not need much fp16 or higher-precision budget if the ranking is good.
  • Protected structure should stay execution-friendly. Column-level exceptions are often easier to reason about than arbitrary sparse masks.

What not to over-import

OWQ is still a post-training quantization result on standard LLMs, not proof that the exact same weak-column ranking survives every training setup or final codec. It also does not imply that every protected subset should be column-based; the durable lesson is the existence of a steep saliency tail.

Parameter Golf translation

A good local translation is to test whether the best protected bytes belong in:

  • LM-head rows
  • projection columns
  • a tiny number of channels or scales

rather than in whole tensors. If OWQ-style structure transfers, the right exception path may be much smaller than current coarse passthrough heuristics.

Lee, C., Jin, J., Kim, T., Kim, H., & Park, E. (2024). OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models. arXiv Preprint arXiv:2306.02272. https://arxiv.org/abs/2306.02272