This index is a working research shelf for Parameter Golf, not a comprehensive bibliography. The goal is to keep the papers that most directly sharpen current lanes, hypotheses, and implementation notes.

Quantization, outliers, and compression-aware training

Best read alongside quantization and outlier handling, outlier-aware compression, decoupled precision, and normalization before projections.

Recursive sharing, recurrent depth, and parameter reuse

Best read alongside recursive and shared-parameter architectures, recursive width scaling, recurrent wide architecture, and recursive layer sharing.

Tokenizer, vocabulary, and output-head efficiency

This cluster connects most strongly to tokenizer and vocabulary efficiency and tokenizer efficiency.

Compute budgeting and inference-time tradeoffs

This is the paper trail behind training economics and evaluation-time compute.

Useful synthesis paths

Meta

Bae, S., Fisch, A., Harutyunyan, H., Ji, Z., Kim, S., & Schuster, T. (2024). Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA. arXiv Preprint arXiv:2410.20672. https://arxiv.org/abs/2410.20672
Liao, B., Herold, C., Hashemi, S. H., Vasilev, S., Khadivi, S., & Monz, C. (2025). ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning. arXiv Preprint arXiv:2503.13089. https://arxiv.org/abs/2503.13089
Steinmetz, C., Childress, G., Herbst, A., Jones, G., Singh, J., Vang, E., & Weinstock, K. (2025). An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits. arXiv Preprint arXiv:2505.08823. https://arxiv.org/abs/2505.08823
Young, S. I. (2025). Radio: Rate-Distortion Optimization for Large Language Model Compression. arXiv Preprint arXiv:2505.03031. https://arxiv.org/abs/2505.03031
Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592

31 items under this folder.