20 items with this tag.
frontiers
Frontier synthesis on the newest shift from handcrafted post-training formats toward training rules and learned representations that directly target compressible model weights.
papers
Paper note on integrating rate-constrained compression pressure directly into LLM training rather than treating compression only as a post-training step.
papers
Paper note on using a single learned neural codec to compress whole LLM-scale weight sets instead of relying only on handcrafted quantization formats.
papers
Paper note on making LLM training explicitly produce more low-rank, compressible weights by constraining Muon updates with a nuclear-norm budget.
papers
Paper note on exploiting weight-space symmetries with bits-back coding so some model bytes can be saved without changing predictions.
moonshots
Moonshot hypothesis that the model should be trained directly to become a good compressed artifact, not merely a good floating-point checkpoint.
moonshots
Moonshot hypothesis that the best compact artifact may store a tiny generator plus latent construction tape and sparse corrections instead of mostly storing raw weight tensors.
moonshots
Moonshot hypothesis that the shape of protected exceptions may matter more than the exact saliency ranking, because structured exception maps can compress better than irregular ones.
moonshots
Moonshot hypothesis that most vocabulary rows in the output head should be regenerated from compact descriptors and shared factors rather than stored directly.
moonshots
Moonshot hypothesis that many apparently different tensors could be stored as one canonical prototype plus cheap transport maps instead of as separate weights.
papers
Paper note on shrinking and retargeting the tokenizer and embedding table to a domain so the model uses fewer vocabulary bytes and shorter sequences.
papers
Paper note on applying rate-distortion theory directly to language-model compression instead of treating bit allocation as a heuristic afterthought.
papers
Paper note on compressing language-model matrices into residual low-rank structure plus a shared neural decoder over vector-quantized latent representations.
frontiers
Frontier synthesis on why repeated structure, clustered values, and regular exception patterns may matter more than nominal precision once the final artifact and metadata are counted.
hypotheses
Hypothesis that compressing or restructuring the LM head can beat modest backbone improvements in compact language models.
ideas
Hypothesis that one small learned codebook bank shared across repeated blocks can beat per-matrix quantization by amortizing metadata and aligning compression with shared-depth structure.
lanes
The lane focused on reducing the gap between train-time weights and the final compressed artifact.
notes
Synthesis note on why vocabulary and output-projection choices can dominate compact-model tradeoffs earlier than expected.
papers
Paper note on AQLM and why codebook-style additive quantization becomes attractive once scalar low-bit methods start wasting error budget on the wrong directions.
papers
Paper note on clustering-based compression as a way to exploit weight structure and outlier concentration when uniform quantization gets brittle.