Folder: notes

Mar 19, 2026

notes

Synthesis note on the recurring compact-model idea that repeated computation can substitute for stored parameters.

Mar 19, 2026

notes

Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.

Mar 19, 2026

notes

Concept note on why outliers dominate low-bit failure and why most serious compression methods end up treating them specially.

Mar 19, 2026

notes

Synthesis note on why vocabulary and output-projection choices can dominate compact-model tradeoffs earlier than expected.

Mar 19, 2026

notes

Synthesis note on why shared-depth transformer designs are attractive under a hard artifact budget, and where they usually break.

Mar 19, 2026

notes

Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.

Mar 19, 2026

notes

Concept note on why tokenization changes not just sequence length but the whole byte/compute story of compact language models.

Mar 19, 2026

notes

Synthesis note on the recurring idea that a small subset of sensitive parameters deserves better precision than the rest.

Parameter Golf Research Garden