Decoupled Precision

Core idea

Extremely low-bit models often fail because they compress everything too uniformly. A recurring fix is to preserve a small amount of high-precision capacity for the parameters that matter most.

Why this keeps showing up

pQuant explicitly frames the problem as parameter democratization and uses a tiny high-precision branch to restore sensitivity structure. (Zhang et al., 2026)
ClusComp argues that outliers increasingly dominate the difficulty of compressing modern LLMs. (Liao et al., 2025)
MicroScopiQ reaches a similar conclusion from a hardware-aware perspective: outliers need special treatment, but the treatment must still be deployment-friendly. (Ramachandran et al., 2024)
PTQ1.61 shows that even post-training methods often succeed or fail based on how they isolate and protect salient structure. (Zhao et al., 2025)

Parameter Golf translation

The important insight is not “use fp16 somewhere.” It is:

identify where uniform low-bit treatment is most destructive
spend bytes only there
keep the majority path extremely cheap

That is the core intuition behind sparse outlier preservation.

Design space

This can show up as:

sparse residual deltas
protected rows or tensors
decoupled branches
clustering or codebook methods that avoid scalar uniformity

Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089

Ramachandran, A., Kundu, S., & Krishna, T. (2024). MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization. arXiv Preprint arXiv:2411.05282. https://arxiv.org/abs/2411.05282

Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592

Zhao, J., Zhang, M., Wang, M., Shang, Y., Zhang, K., Guan, W., Wang, Y., & Zhang, M. (2025). PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models. arXiv Preprint arXiv:2502.13179. https://arxiv.org/abs/2502.13179

Parameter Golf Research Garden

Section Tree

Decoupled Precision

Core idea

Why this keeps showing up

Parameter Golf translation

Design space

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Decoupled Precision

Core idea

Why this keeps showing up

Parameter Golf translation

Design space

Related

Graph View

Table of Contents

Referenced by

Recent notes