Objective
Probe recurrent wide architecture with a cheap local breadth run before spending more budget on confirm/full evaluation.
Candidate setup
- width increased to around
dim=1024 - aggressive sharing such as
LAYER_SHARE_STRIDE=9 PROFILE=breadth- compression/export path still subject to the hard artifact cap
Why this experiment exists
The point is not merely to test “more width.” It is to test whether reusing one wide block beats storing many thinner unique blocks once compression is part of the score.
Metrics to watch
val_bpbas the primary proxy targettotal_bytesand artifact headroom as hard constraints- roundtrip degradation versus pre-quant quality
step_msbecause recurrence can change throughput characteristics
Decision rule
This idea only graduates if it shows evidence that the byte savings from sharing are turning into real post-roundtrip quality rather than just prettier architecture diagrams.