Local Research Lineage

This page summarizes the local research chronology already visible in this repository’s kept commits and autoresearch.jsonl.

It is intentionally different from challenge history, which only summarizes the thin public record.

Phase 0: make the loop real at all

The earliest local work was not about squeezing the last few basis points out of the model. It was about making a reproducible local loop exist.

Important turning points included:

artifact-phase observability
benchmark supervision / timeouts
enough schedule simplification to get valid runs instead of repeated timeouts
fixing the Metal graph / evaluation plumbing so the trainer stopped stalling

This is the phase that made later research possible at all.

Phase 1: proxy-loop shaping won before fancy modeling did

Once the pipeline ran, one of the biggest early sources of improvement was simply making the local proxy useful.

The broad direction was:

add capped validation prefixes for cheap breadth runs
make breadth fast enough to iterate on
then step up the schedule gradually rather than guessing the final shape all at once

Representative kept improvements from autoresearch.jsonl:

breadth improved from roughly 2.88 → 2.75 → 2.64 → 2.58 → 2.53
confirm then improved from roughly 2.55 → 2.51

That phase matters because it established a core local lesson:

better research throughput and a saner proxy loop were worth more than early speculative architecture changes.

Phase 2: compression-aware training tweaks beat many export-only tweaks

After the loop stabilized, the strongest local wins came from training-side robustness rather than from broad artifact-format surgery.

Especially important kept changes were:

GRAD_CLIP_NORM=1.0
lowering TIED_EMBED_LR to 0.04
lowering SCALAR_LR to 0.035
optimizer smoothing via BETA2 rising through 0.97, 0.98, 0.99, 0.995

These moves repeatedly beat many small export-only variants.

Phase 3: many selective export tweaks mostly failed or stayed flat

The logs also show a useful negative result: a lot of obvious compression-side interventions did not clearly beat the stronger training-side setup.

Examples that mostly failed, regressed, or stayed flat:

clip-percentile nudges with little material benefit
fp16 passthrough for medium-size KV projections
fp16 passthrough for tied embeddings
no-clipping variants
several narrowly targeted protected-tensor tests

This phase is important because it sharpens the real research question. The challenge is not solved by sprinkling high precision onto arbitrary tensors.

Phase 4: optimizer smoothing became a real local frontier

One of the cleanest confirmed local trajectories was the optimizer-smoothing ladder:

BETA2 0.97
BETA2 0.98
BETA2 0.99
BETA2 0.995

Each step improved confirm val_bpb in the logs while holding bytes roughly steady or slightly better.

That makes optimizer smoothing part of the actual local lineage, not just an implementation detail.

Phase 5: speculative shared-depth work was promising but fragile

Local history also includes a clear speculative branch around shared depth and recurrence.

This branch produced:

shared-depth plumbing attempts
crashy std::bad_cast failures during raw artifact save
the conceptual push toward recurrent wide architecture
a concrete local record in RWA breadth experiment

The important correction is that this lane is not imaginary; it has already been explored locally, even if it has not yet produced the cleanest kept run lineage.

Phase 6: AlphaXiv-driven architecture reading produced a concrete win

A later local phase explicitly mined recent papers and converted them into experiment ideas.

The clearest success so far is Extra RMSNorm, which produced:

a breadth improvement from about 2.6491 → 2.6108
then a confirm improvement from about 2.4398 → 2.4260

This is one of the strongest examples in the repo of the KB actually feeding the benchmark loop.

What this chronology says overall

The local lineage so far suggests a sequence:

make the loop runnable
make the proxy meaningful
win with training-side robustness before exotic export tricks
keep recurrence/shared-depth alive as a real but fragile lane
use literature synthesis to guide architecture tweaks with better priors

Parameter Golf Research Garden

Section Tree

Local Research Lineage

Phase 0: make the loop real at all

Phase 1: proxy-loop shaping won before fancy modeling did

Phase 2: compression-aware training tweaks beat many export-only tweaks

Phase 3: many selective export tweaks mostly failed or stayed flat

Phase 4: optimizer smoothing became a real local frontier

Phase 5: speculative shared-depth work was promising but fragile

Phase 6: AlphaXiv-driven architecture reading produced a concrete win

What this chronology says overall

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Local Research Lineage

Phase 0: make the loop real at all

Phase 1: proxy-loop shaping won before fancy modeling did

Phase 2: compression-aware training tweaks beat many export-only tweaks

Phase 3: many selective export tweaks mostly failed or stayed flat

Phase 4: optimizer smoothing became a real local frontier

Phase 5: speculative shared-depth work was promising but fragile

Phase 6: AlphaXiv-driven architecture reading produced a concrete win

What this chronology says overall

Related

Graph View

Table of Contents

Referenced by

Recent notes