Tag: idea

Mar 19, 2026

ideas

Entropy-Weighted Vocabulary Rescue

Hypothesis that most head-side quantization damage is concentrated in a tiny set of difficult token rows, making row-level protection a better byte trade than uniform head precision.

Mar 19, 2026

ideas

Global Codebook Recursive Backbone

Hypothesis that one small learned codebook bank shared across repeated blocks can beat per-matrix quantization by amortizing metadata and aligning compression with shared-depth structure.

Mar 19, 2026

ideas

Head-to-Depth Budget Swap

Hypothesis that shrinking tokenizer and LM-head burden, then reinvesting the saved bytes into a wider shared backbone, beats spending the same budget on a larger static head.

Mar 19, 2026

ideas

Norm-Only Phase Specialization

Hypothesis that shared-depth models can recover most layer-role specialization using only per-step RMSNorm and tiny channel gates, with almost no byte cost.

Mar 19, 2026

ideas

Token-Adaptive Recurrent Refinement

Hypothesis that a compact shared-depth model should spend extra inference-time passes only on uncertain positions, turning compute into quality more efficiently than storing more static depth.

Parameter Golf Research Garden

Section Tree

Tag: idea

Entropy-Weighted Vocabulary Rescue

Global Codebook Recursive Backbone

Head-to-Depth Budget Swap

Norm-Only Phase Specialization

Token-Adaptive Recurrent Refinement

Graph View