This page is a compact selection of material local turning points, not a full dump of every run.

Early bootstrap milestone

96804a4 — first real baseline after pipeline unblocking

  • local baseline reached roughly val_bpb=4.1755
  • the main value was not the score itself
  • the real win was proving the pipeline could finally train and emit a valid artifact-facing metric

Breadth-loop shaping milestones

d05784f — capped validation prefix + faster breadth loop

  • improved breadth from about 2.8822 to 2.7498
  • established that a useful cheap proxy was achievable

62ac127 / 26161d4 / d14f64b — schedule sweep

  • breadth improved in steps from about 2.752.642.582.53
  • this is one of the clearest examples of boring-but-real research infrastructure work paying off

Confirm-loop strengthening milestones

a1f08d5 — stronger confirm schedule

  • confirm improved from about 2.5548 to 2.5100
  • helped stabilize the cheap decision layer above the breadth profile

Compression-aware training milestones

79f164bGRAD_CLIP_NORM=1.0

  • confirm improved from about 2.5100 to 2.4740
  • one of the clearest signals that training-side robustness mattered more than many narrow export tweaks

54cef9eTIED_EMBED_LR=0.04

  • confirm improved to about 2.4655
  • also slightly reduced bytes

b1de211SCALAR_LR=0.035

  • confirm improved further to about 2.4605

ceaf59796b257150ddef2a095cda

  • optimizer smoothing ladder: BETA2 rose from 0.97 to 0.98 to 0.99 to 0.995
  • confirm improved stepwise from about 2.45382.44842.44652.4398
  • this is one of the strongest clean local trajectories in the logs

Full-validation milestones

42f3e58 — first strong full validation

  • promoted the improved compression-aware setup to PROFILE=full
  • reached roughly val_bpb=2.3652
  • validated that the confirm gains were not just cheap-proxy noise

eea5571 — improved full candidate

  • full val_bpb improved further to about 2.3548
  • artifact stayed well under cap

AlphaXiv-driven architecture milestone

6cf5b46 — extra RMSNorm breadth win

  • breadth improved from about 2.6491 to 2.6108
  • translated literature into a concrete local architectural win

38ff505 — extra RMSNorm confirm win

  • confirm improved from about 2.4398 to 2.4260
  • one of the clearest examples of the knowledge garden producing a benchmark-relevant idea

What these turning points say

The strongest local wins so far cluster around:

  • making the proxy loop faithful enough to use
  • improving training-side robustness
  • smoothing optimization
  • importing the right architectural stabilizer from recent literature

That is a much richer internal story than the small public run record alone would suggest.