This page records local failures worth remembering so the garden does not quietly rediscover them.

It is not a complete crash log. It is a research memory page.

Category 1: export-side tweaks that looked sensible but mostly lost

The logs contain many attempts that spent bytes on targeted fp16 passthrough or small quantization tweaks without producing a convincing confirm improvement.

Examples include:

  • fp32 per-row scales instead of fp16
  • fp16 passthrough for medium-size KV projections
  • fp16 passthrough for the tied embedding matrix
  • clip-percentile nudges with little material benefit
  • no-clipping or embedding-only no-clipping variants

Lesson

The local loop repeatedly suggested that training-side robustness mattered more than ad hoc protection of arbitrary tensors.

Category 2: fragile shared-depth plumbing

The shared-depth / recurrent branch has already been explored enough to teach several implementation lessons.

Observed failures included:

  • raw artifact-save crashes (std::bad_cast)
  • stale function-signature mismatches after regularized-objective changes
  • non-array metadata leaking into saved module state
  • recurring fragility when shared-depth plumbing interacted with the artifact path

Lesson

The architecture lane is still alive, but the current repo history says it is implementation-fragile. A future pass should treat serialization and state structure as first-class design constraints, not afterthoughts.

Category 3: tiny confirm deltas that were not worth keeping

The logs also show an important hygiene rule: many changes did improve a metric slightly, but not enough to justify keeping once bytes, runtime, or robustness were counted.

Examples:

  • some clip-percentile changes
  • more aggressive clipping
  • stronger clip norm (0.5) when the gain was too small for the byte/runtime trade
  • longer warmdown when runtime rose sharply for a tiny confirm delta

Lesson

A compact-model benchmark can generate lots of numerical motion that is not actually a research win. This is why the log kept rejecting non-material deltas.

Category 4: process failures that still mattered

Some early runs failed for procedural reasons rather than conceptual ones:

  • benchmark timeouts before valid metrics
  • shell parse errors
  • missing scripts in post-run checks
  • runs killed mid-training or mid-eval

Lesson

These are not exciting, but they shaped the real local lineage. Until the loop itself was trustworthy, stronger modeling conclusions would have been premature.

What this page is trying to prevent

Without a page like this, later readers can easily over-remember only the kept wins and under-remember the shape of the search space that already collapsed.

This page therefore exists to preserve three local facts:

  1. many export-only interventions were not enough
  2. shared-depth work is promising but currently brittle
  3. not every small metric improvement deserves to survive