Hypothesis

A small, explicitly protected set of high-error or high-sensitivity parameters should buy a disproportionately large improvement in compressed quality relative to its byte cost. (Liao et al., 2025; Zhang et al., 2026)

Why this is plausible

  • pQuant argues that extremely low-bit models fail when all parameters are treated too uniformly.
  • ClusComp shows outliers increasingly dominate quantization difficulty in newer LLMs.
  • Our challenge has hard storage limits but often some residual headroom, making a tiny high-precision side channel attractive.

Candidate implementations

  • sparse fp16 residuals for the largest quantization errors
  • mixed-precision protection for selected tensors or rows
  • decoupled branch designs that keep most weights cheap but preserve a small sensitive subset

Risks

  • metadata overhead overwhelms the actual gain
  • protected weights help nominal loss but not final artifact score
  • selection heuristics overfit to a narrow data or model regime
Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089
Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592