9 items with this tag.
moonshots
Moonshot hypothesis that repeated depth should specialize through a persistent internal role state rather than through stored layer-specific parameters.
hypotheses
Hypothesis that tiny per-depth conditioning can recover much of the specialization lost by strict parameter sharing.
hypotheses
Concrete architecture hypothesis: use aggressive depth sharing to buy much more width, then spend leftover bytes on stability and selective precision.
hypotheses
Hypothesis that storing fewer unique layers and spending the savings on width or lightweight per-layer adaptation is a better artifact trade than many fully unique blocks.
hypotheses
Synthesis hypothesis that the strongest compact artifacts will combine shared depth, activation discipline, selective precision, and cheap specialization rather than relying on one trick alone.
lanes
Why parameter sharing may be the cleanest way to buy width, extra compute, or light specialization under a hard artifact cap.
notes
Synthesis note on the recurring compact-model idea that repeated computation can substitute for stored parameters.
notes
Synthesis note on why recurrent transformers often need tiny phase-specific signals instead of perfectly identical behavior at every depth.
experiments
A breadth-profile local test of the recurrent-wide-architecture idea: aggressive depth sharing plus width expansion under the artifact cap.