8 items under this folder.
notes
Synthesis note on the recurring compact-model idea that repeated computation can substitute for stored parameters.
notes
Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.
notes
Concept note on why outliers dominate low-bit failure and why most serious compression methods end up treating them specially.
notes
Synthesis note on why vocabulary and output-projection choices can dominate compact-model tradeoffs earlier than expected.
notes
Synthesis note on why shared-depth transformer designs are attractive under a hard artifact budget, and where they usually break.
notes
Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.
notes
Concept note on why tokenization changes not just sequence length but the whole byte/compute story of compact language models.
notes
Synthesis note on the recurring idea that a small subset of sensitive parameters deserves better precision than the rest.