Tag: quantization

Mar 20, 2026

papers

Mamba-PTQ

Paper note on activation outliers in recurrent state-space language models and why quantization difficulty survives architectural changes.

Mar 19, 2026

papers

LittleBit

Paper note on pushing beyond 1-bit LLM compression by factorizing weight matrices into binary latent factors with learned multi-scale compensation.

Mar 19, 2026

frontiers

Rate-Distortion for Artifact Caps

Frontier synthesis on why the best next compression wins may come from explicit byte-return accounting rather than better nominal low-bit methods alone.

Mar 19, 2026

papers

OWQ

Paper note on preserving a tiny set of outlier-sensitive weight columns in high precision while quantizing the rest of the model aggressively.

Mar 19, 2026

papers

Radio

Paper note on applying rate-distortion theory directly to language-model compression instead of treating bit allocation as a heuristic afterthought.

Mar 19, 2026

papers

ReALLM

Paper note on compressing language-model matrices into residual low-rank structure plus a shared neural decoder over vector-quantized latent representations.

Mar 19, 2026

frontiers

Byte Allocation Beats Average Bit-Width

Frontier synthesis on why the next gains may come from allocating scarce high-fidelity bytes intelligently rather than chasing a single global quantization regime.

Mar 19, 2026

hypotheses

RMSNorm Stabilized Scaling

Hypothesis that extra RMSNorm before projections improves post-roundtrip quality by stabilizing low-bit training and export.

Mar 19, 2026

hypotheses

Sparse Outlier Preservation

Hypothesis that protecting a tiny subset of highly sensitive parameters buys disproportionately large quality gains under a strict artifact cap.

Mar 19, 2026

lanes

Quantization, Outliers, and Compression-Aware Training

The lane focused on reducing the gap between train-time weights and the final compressed artifact.

Mar 19, 2026

notes

Normalization Before Projections

Why pre-projection normalization is a recurring pattern in low-bit and compression-aware transformer design.

Mar 19, 2026

notes

Outlier-aware Compression

Concept note on why outliers dominate low-bit failure and why most serious compression methods end up treating them specially.

Mar 19, 2026

papers

AWQ

Paper note on activation-aware weight quantization and the claim that a tiny set of salient channels dominates low-bit error.

Mar 19, 2026

papers

BitNet b1.58

Paper note on training ternary 1.58-bit language models from scratch and why ultra-low-bit modeling should be treated as a native design regime.

Mar 19, 2026

papers

An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

Paper note on the claim that an extra RMSNorm before linear projections is a disproportionately strong stabilizer for extreme low-bit finetuning.

Mar 19, 2026

papers

pQuant

Paper note on decoupled low-bit training with a tiny high-precision branch for the parameters that matter most.

Mar 19, 2026

papers

QuaRot

Paper note on using rotations to remove hidden-state outliers so that weights, activations, and KV cache can all be quantized more uniformly.

Mar 19, 2026

papers

QuEST

Paper note on stabilizing 1-bit weight-and-activation training through better low-bit distribution fitting and more trustworthy gradients.

Mar 19, 2026

notes

Decoupled Precision

Synthesis note on the recurring idea that a small subset of sensitive parameters deserves better precision than the rest.

Parameter Golf Research Garden

Section Tree

Tag: quantization

Mamba-PTQ

LittleBit

Rate-Distortion for Artifact Caps

OWQ

Radio

ReALLM

Byte Allocation Beats Average Bit-Width

RMSNorm Stabilized Scaling

Sparse Outlier Preservation

Quantization, Outliers, and Compression-Aware Training

Normalization Before Projections

Outlier-aware Compression

AWQ

BitNet b1.58

An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

pQuant

QuaRot

QuEST

Decoupled Precision

Graph View