This page is a compact selection of material local turning points, not a full dump of every run.
Early bootstrap milestone
96804a4 — first real baseline after pipeline unblocking
- local baseline reached roughly
val_bpb=4.1755 - the main value was not the score itself
- the real win was proving the pipeline could finally train and emit a valid artifact-facing metric
Breadth-loop shaping milestones
d05784f — capped validation prefix + faster breadth loop
- improved breadth from about
2.8822to2.7498 - established that a useful cheap proxy was achievable
62ac127 / 26161d4 / d14f64b — schedule sweep
- breadth improved in steps from about
2.75→2.64→2.58→2.53 - this is one of the clearest examples of boring-but-real research infrastructure work paying off
Confirm-loop strengthening milestones
a1f08d5 — stronger confirm schedule
- confirm improved from about
2.5548to2.5100 - helped stabilize the cheap decision layer above the breadth profile
Compression-aware training milestones
79f164b — GRAD_CLIP_NORM=1.0
- confirm improved from about
2.5100to2.4740 - one of the clearest signals that training-side robustness mattered more than many narrow export tweaks
54cef9e — TIED_EMBED_LR=0.04
- confirm improved to about
2.4655 - also slightly reduced bytes
b1de211 — SCALAR_LR=0.035
- confirm improved further to about
2.4605
ceaf597 → 96b2571 → 50ddef2 → a095cda
- optimizer smoothing ladder:
BETA2rose from0.97to0.98to0.99to0.995 - confirm improved stepwise from about
2.4538→2.4484→2.4465→2.4398 - this is one of the strongest clean local trajectories in the logs
Full-validation milestones
42f3e58 — first strong full validation
- promoted the improved compression-aware setup to
PROFILE=full - reached roughly
val_bpb=2.3652 - validated that the confirm gains were not just cheap-proxy noise
eea5571 — improved full candidate
- full
val_bpbimproved further to about2.3548 - artifact stayed well under cap
AlphaXiv-driven architecture milestone
6cf5b46 — extra RMSNorm breadth win
- breadth improved from about
2.6491to2.6108 - translated literature into a concrete local architectural win
38ff505 — extra RMSNorm confirm win
- confirm improved from about
2.4398to2.4260 - one of the clearest examples of the knowledge garden producing a benchmark-relevant idea
What these turning points say
The strongest local wins so far cluster around:
- making the proxy loop faithful enough to use
- improving training-side robustness
- smoothing optimization
- importing the right architectural stabilizer from recent literature
That is a much richer internal story than the small public run record alone would suggest.