Research Lanes

These lanes are the top-level buckets for background reading, hypothesis formation, and synthesis across the compact-LLM design space.

Active lanes

Recursive and shared-parameter architectures

How to trade stored depth for reused computation, wider blocks, or cheap per-step specialization.

Key pages:

Quantization, outliers, and compression-aware training

How to make the final compressed artifact behave like the trained model instead of collapsing under uniform low-bit treatment.

Key pages:

Tokenizer and vocabulary efficiency

How tokenization, vocabulary size, and the LM head reshape both compute and stored bytes in compact language models.

Key pages:

Training economics and small-model bottlenecks

How compact-model regimes change what matters: logits, sequence length, reuse, and width allocation can dominate sooner than standard scaling intuitions suggest.

Key pages:

Evaluation-time compute and inference scaling

How a compact model can use bounded extra reasoning or refinement steps to outperform a larger static artifact.

Key pages:

Why lane pages matter

Paper notes are too granular and experiment logs are too specific. Lane pages are where we answer:

what lever this family is trying to pull
why it matters under the 16 MB cap
which mechanisms sit inside the lane
where the lane naturally composes with other lanes
which concrete hypotheses deserve follow-up next

Cross-lane tensions worth tracking

storage vs compute: shared depth and evaluation-time refinement both spend time to save bytes
uniformity vs selectivity: low-bit methods win when they stop treating every tensor as equally fragile
sequence length vs vocab size: a tokenizer can save tokens while making the output layer harder to store
width vs specialization: recurrent blocks want more width, but they often need cheap phase-specific behavior to avoid collapse

Parameter Golf Research Garden

Section Tree

Research Lanes

Active lanes

Recursive and shared-parameter architectures

Quantization, outliers, and compression-aware training

Tokenizer and vocabulary efficiency

Training economics and small-model bottlenecks

Evaluation-time compute and inference scaling

Why lane pages matter

Cross-lane tensions worth tracking

Evaluation-Time Compute and Inference Scaling

Quantization, Outliers, and Compression-Aware Training

Recursive and Shared-Parameter Architectures

Tokenizer and Vocabulary Efficiency

Training Economics and Small-Model Bottlenecks

Graph View

Section Tree

Research Lanes

Active lanes

Why lane pages matter

Cross-lane tensions worth tracking

Related

Graph View