Parameter Golf Research Garden

This Quartz garden is the research layer for our Parameter Golf work.

It separates three things that should not be collapsed together:

the public challenge record in challenge history
the repository’s own local experiment lineage in local experiment history
the main conceptual graph of lanes, hypotheses, frontiers, ideas, and papers

If you are looking for harness, workflow, or editorial material, start in the separate meta layer.

Choose a route

Core research

Project operations / meta

Boundary of the main garden

The top-level garden should stay centered on:

challenge constraints and evaluation framing
public-record interpretation
local experiment evidence
model-design lanes and mechanism tradeoffs
hypotheses worth testing
paper synthesis and literature context

Harness rules, benchmark procedure, background execution ideas, and KB editorial planning belong in meta.

Current active questions

Can extra RMSNorm before projections reliably improve post-roundtrip quality in our local benchmark? (Steinmetz et al., 2025)
Can sparse or decoupled high-precision outlier preservation convert artifact headroom into better compressed quality? (Zhang et al., 2026)
Is more aggressive parameter sharing the best route to buying width under the 16 MB cap, and does a recurrent wide architecture make that trade concrete? (Bae et al., 2024; Csordás et al., 2024; Üyük et al., 2024)
Are tokenizer and vocabulary choices underexplored relative to architecture tweaks? (Gu et al., 2024; Lotz et al., 2025; Vennam et al., 2024)
Where does evaluation-time compute become a better trade than storing more parameters? (Wu et al., 2024)

If you need the harness instead

Use the meta layer for:

benchmark and operating procedure
background research execution ideas
Pi/autoresearch extension notes
editorial planning for the garden itself

Bae, S., Fisch, A., Harutyunyan, H., Ji, Z., Kim, S., & Schuster, T. (2024). Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA. arXiv Preprint arXiv:2410.20672. https://arxiv.org/abs/2410.20672

Csordás, R., Irie, K., Schmidhuber, J., Potts, C., & Manning, C. D. (2024). MoEUT: Mixture-of-Experts Universal Transformers. arXiv Preprint arXiv:2405.16039. https://arxiv.org/abs/2405.16039

Gu, S., Zhao, M., Zhang, B., Wang, L., Li, J., & Liu, G. (2024). ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model. arXiv Preprint arXiv:2410.04335. https://arxiv.org/abs/2410.04335

Lotz, J. F., Lopes, A. V., Peitz, S., Setiawan, H., & Emili, L. (2025). Beyond Text Compression: Evaluating Tokenizers Across Scales. arXiv Preprint arXiv:2506.03101. https://arxiv.org/abs/2506.03101

Steinmetz, C., Childress, G., Herbst, A., Jones, G., Singh, J., Vang, E., & Weinstock, K. (2025). An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits. arXiv Preprint arXiv:2505.08823. https://arxiv.org/abs/2505.08823

Üyük, C., Lasby, M., Yassin, M., Evci, U., & Ioannou, Y. (2024). Learning Parameter Sharing with Tensor Decompositions and Sparsity. arXiv Preprint arXiv:2411.09816. https://arxiv.org/abs/2411.09816

Vennam, S., Joishy, A., & Kumaraguru, P. (2024). LLM Vocabulary Compression for Low-Compute Environments. arXiv Preprint arXiv:2411.06371. https://arxiv.org/abs/2411.06371

Wu, Y., Sun, Z., Li, S., Welleck, S., & Yang, Y. (2024). Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models. arXiv Preprint arXiv:2408.00724. https://arxiv.org/abs/2408.00724

Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592