Dead Ends and Failed Directions

This page records local failures worth remembering so the garden does not quietly rediscover them.

It is not a complete crash log. It is a research memory page.

Category 1: export-side tweaks that looked sensible but mostly lost

The logs contain many attempts that spent bytes on targeted fp16 passthrough or small quantization tweaks without producing a convincing confirm improvement.

Examples include:

fp32 per-row scales instead of fp16
fp16 passthrough for medium-size KV projections
fp16 passthrough for the tied embedding matrix
clip-percentile nudges with little material benefit
no-clipping or embedding-only no-clipping variants

Lesson

The local loop repeatedly suggested that training-side robustness mattered more than ad hoc protection of arbitrary tensors.

Category 2: fragile shared-depth plumbing

The shared-depth / recurrent branch has already been explored enough to teach several implementation lessons.

Observed failures included:

raw artifact-save crashes (std::bad_cast)
stale function-signature mismatches after regularized-objective changes
non-array metadata leaking into saved module state
recurring fragility when shared-depth plumbing interacted with the artifact path

Lesson

The architecture lane is still alive, but the current repo history says it is implementation-fragile. A future pass should treat serialization and state structure as first-class design constraints, not afterthoughts.

Category 3: tiny confirm deltas that were not worth keeping

The logs also show an important hygiene rule: many changes did improve a metric slightly, but not enough to justify keeping once bytes, runtime, or robustness were counted.

Examples:

some clip-percentile changes
more aggressive clipping
stronger clip norm (0.5) when the gain was too small for the byte/runtime trade
longer warmdown when runtime rose sharply for a tiny confirm delta

Lesson

A compact-model benchmark can generate lots of numerical motion that is not actually a research win. This is why the log kept rejecting non-material deltas.

Category 4: process failures that still mattered

Some early runs failed for procedural reasons rather than conceptual ones:

benchmark timeouts before valid metrics
shell parse errors
missing scripts in post-run checks
runs killed mid-training or mid-eval

Lesson

These are not exciting, but they shaped the real local lineage. Until the loop itself was trustworthy, stronger modeling conclusions would have been premature.

What this page is trying to prevent

Without a page like this, later readers can easily over-remember only the kept wins and under-remember the shape of the search space that already collapsed.

This page therefore exists to preserve three local facts:

many export-only interventions were not enough
shared-depth work is promising but currently brittle
not every small metric improvement deserves to survive

Parameter Golf Research Garden

Section Tree

Dead Ends and Failed Directions

Category 1: export-side tweaks that looked sensible but mostly lost

Lesson

Category 2: fragile shared-depth plumbing

Lesson

Category 3: tiny confirm deltas that were not worth keeping

Lesson

Category 4: process failures that still mattered

Lesson

What this page is trying to prevent

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Dead Ends and Failed Directions

Category 1: export-side tweaks that looked sensible but mostly lost

Lesson

Category 2: fragile shared-depth plumbing

Lesson

Category 3: tiny confirm deltas that were not worth keeping

Lesson

Category 4: process failures that still mattered

Lesson

What this page is trying to prevent

Related

Graph View

Table of Contents

Referenced by

Recent notes