Objective

Probe recurrent wide architecture with a cheap local breadth run before spending more budget on confirm/full evaluation.

Candidate setup

  • width increased to around dim=1024
  • aggressive sharing such as LAYER_SHARE_STRIDE=9
  • PROFILE=breadth
  • compression/export path still subject to the hard artifact cap

Why this experiment exists

The point is not merely to test “more width.” It is to test whether reusing one wide block beats storing many thinner unique blocks once compression is part of the score.

Metrics to watch

  • val_bpb as the primary proxy target
  • total_bytes and artifact headroom as hard constraints
  • roundtrip degradation versus pre-quant quality
  • step_ms because recurrence can change throughput characteristics

Decision rule

This idea only graduates if it shows evidence that the byte savings from sharing are turning into real post-roundtrip quality rather than just prettier architecture diagrams.