This page tracks the working hypotheses that deserve repeated attention in the compact-LLM conceptual graph.

Current lane balance

The graph now has three especially important clusters:

  1. compression-aware robustness for closing the train-to-artifact gap
  2. shared-depth architectures for trading storage into reusable computation
  3. tokenizer, vocabulary, and LM-head efficiency for treating the output path as part of the budget

Evaluation-time compute remains a strategic overlay on top of compact recurrent designs rather than a fully separate world.

Current hypotheses

RMSNorm stabilized scaling

Sparse outlier preservation

Recursive width scaling

Recurrent wide architecture

Phase-conditioned sharing

Output-head compression

Iterative refinement over stored depth

Near-term ranking by conceptual leverage

  1. RMSNorm stabilized scaling
  2. Sparse outlier preservation
  3. Recursive width scaling
  4. Output-head compression
  5. Phase-conditioned sharing
  6. Iterative refinement over stored depth

Cross-lane questions

  • when does architecture beat compression-side robustness work?
  • how often is the LM head the hidden bottleneck in a compact model?
  • what is the cheapest form of specialization that keeps shared-depth models expressive?
  • when does extra evaluation-time compute beat extra stored parameters?
  • which improvements survive once the whole artifact, including vocab-dependent pieces, is compressed?