This Quartz garden is the research layer for our Parameter Golf work.
It separates three things that should not be collapsed together:
- the public challenge record in challenge history
- the repository’s own local experiment lineage in local experiment history
- the main conceptual graph of lanes, hypotheses, frontiers, ideas, and papers
If you are looking for harness, workflow, or editorial material, start in the separate meta layer.
Choose a route
Core research
- Challenge overview
- Challenge history
- Local experiment history
- Research lanes
- Hypothesis ledger
- Research frontiers
- Research ideas
- Paper index
- Research atlas
- Map of content
Project operations / meta
Boundary of the main garden
The top-level garden should stay centered on:
- challenge constraints and evaluation framing
- public-record interpretation
- local experiment evidence
- model-design lanes and mechanism tradeoffs
- hypotheses worth testing
- paper synthesis and literature context
Harness rules, benchmark procedure, background execution ideas, and KB editorial planning belong in meta.
Current active questions
- Can extra RMSNorm before projections reliably improve post-roundtrip quality in our local benchmark? (Steinmetz et al., 2025)
- Can sparse or decoupled high-precision outlier preservation convert artifact headroom into better compressed quality? (Zhang et al., 2026)
- Is more aggressive parameter sharing the best route to buying width under the 16 MB cap, and does a recurrent wide architecture make that trade concrete? (Bae et al., 2024; Csordás et al., 2024; Üyük et al., 2024)
- Are tokenizer and vocabulary choices underexplored relative to architecture tweaks? (Gu et al., 2024; Lotz et al., 2025; Vennam et al., 2024)
- Where does evaluation-time compute become a better trade than storing more parameters? (Wu et al., 2024)
Recommended reading order
- Constraints and scoring
- Challenge history
- Local experiment history
- Research lanes
- Research frontiers
- Hypothesis ledger
- Paper index
- Research ideas
- Moonshots
If you need the harness instead
Use the meta layer for:
- benchmark and operating procedure
- background research execution ideas
- Pi/autoresearch extension notes
- editorial planning for the garden itself
Bae, S., Fisch, A., Harutyunyan, H., Ji, Z., Kim, S., & Schuster, T. (2024). Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA. arXiv Preprint arXiv:2410.20672. https://arxiv.org/abs/2410.20672
Csordás, R., Irie, K., Schmidhuber, J., Potts, C., & Manning, C. D. (2024). MoEUT: Mixture-of-Experts Universal Transformers. arXiv Preprint arXiv:2405.16039. https://arxiv.org/abs/2405.16039
Gu, S., Zhao, M., Zhang, B., Wang, L., Li, J., & Liu, G. (2024). ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model. arXiv Preprint arXiv:2410.04335. https://arxiv.org/abs/2410.04335
Lotz, J. F., Lopes, A. V., Peitz, S., Setiawan, H., & Emili, L. (2025). Beyond Text Compression: Evaluating Tokenizers Across Scales. arXiv Preprint arXiv:2506.03101. https://arxiv.org/abs/2506.03101
Steinmetz, C., Childress, G., Herbst, A., Jones, G., Singh, J., Vang, E., & Weinstock, K. (2025). An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits. arXiv Preprint arXiv:2505.08823. https://arxiv.org/abs/2505.08823
Üyük, C., Lasby, M., Yassin, M., Evci, U., & Ioannou, Y. (2024). Learning Parameter Sharing with Tensor Decompositions and Sparsity. arXiv Preprint arXiv:2411.09816. https://arxiv.org/abs/2411.09816
Vennam, S., Joishy, A., & Kumaraguru, P. (2024). LLM Vocabulary Compression for Low-Compute Environments. arXiv Preprint arXiv:2411.06371. https://arxiv.org/abs/2411.06371
Wu, Y., Sun, Z., Li, S., Welleck, S., & Yang, Y. (2024). Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models. arXiv Preprint arXiv:2408.00724. https://arxiv.org/abs/2408.00724
Zhang, W., Liu, B., Hu, Y., Bai, X., Zhang, W., & Cui, B. (2026). pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training. arXiv Preprint arXiv:2602.22592. https://arxiv.org/abs/2602.22592