18 items with this tag.
papers
Paper note on pushing beyond 1-bit LLM compression by factorizing weight matrices into binary latent factors with learned multi-scale compensation.
frontiers
Frontier synthesis on why the best next compression wins may come from explicit byte-return accounting rather than better nominal low-bit methods alone.
papers
Paper note on preserving a tiny set of outlier-sensitive weight columns in high precision while quantizing the rest of the model aggressively.
papers
Paper note on applying rate-distortion theory directly to language-model compression instead of treating bit allocation as a heuristic afterthought.
papers
Paper note on compressing language-model matrices into residual low-rank structure plus a shared neural decoder over vector-quantized latent representations.
frontiers
Frontier synthesis on why the next gains may come from allocating scarce high-fidelity bytes intelligently rather than chasing a single global quantization regime.
hypotheses
Hypothesis that extra RMSNorm before projections improves post-roundtrip quality by stabilizing low-bit training and export.
hypotheses
Hypothesis that protecting a tiny subset of highly sensitive parameters buys disproportionately large quality gains under a strict artifact cap.
lanes
The lane focused on reducing the gap between train-time weights and the final compressed artifact.
notes
Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.
notes
Concept note on why outliers dominate low-bit failure and why most serious compression methods end up treating them specially.
papers
Paper note on activation-aware weight quantization and the claim that a tiny set of salient channels dominates low-bit error.
papers
Paper note on training ternary 1.58-bit language models from scratch and why ultra-low-bit modeling should be treated as a native design regime.
papers
Paper note on the claim that an extra RMSNorm before linear projections is a disproportionately strong stabilizer for extreme low-bit finetuning.
papers
Paper note on decoupled low-bit training with a tiny high-precision branch for the parameters that matter most.
papers
Paper note on using rotations to remove hidden-state outliers so that weights, activations, and KV cache can all be quantized more uniformly.
papers
Paper note on stabilizing 1-bit weight-and-activation training through better low-bit distribution fitting and more trustworthy gradients.
notes
Synthesis note on the recurring idea that a small subset of sensitive parameters deserves better precision than the rest.