9 items under this folder.
hypotheses
Hypothesis that a smaller recurrent model with bounded extra evaluation-time refinement can beat a larger static artifact under the same storage cap.
hypotheses
Current working hypotheses for Parameter Golf, with status, rationale, and lane placement.
hypotheses
Hypothesis that compressing or restructuring the LM head can beat modest backbone improvements in compact language models.
hypotheses
Hypothesis that tiny per-depth conditioning can recover much of the specialization lost by strict parameter sharing.
hypotheses
Concrete architecture hypothesis: use aggressive depth sharing to buy much more width, then spend leftover bytes on stability and selective precision.
hypotheses
Hypothesis that storing fewer unique layers and spending the savings on width or lightweight per-layer adaptation is a better artifact trade than many fully unique blocks.
hypotheses
Hypothesis that extra RMSNorm before projections improves post-roundtrip quality by stabilizing low-bit training and export.
hypotheses
Hypothesis that protecting a tiny subset of highly sensitive parameters buys disproportionately large quality gains under a strict artifact cap.
hypotheses
Synthesis hypothesis that the strongest compact artifacts will combine shared depth, activation discipline, selective precision, and cheap specialization rather than relying on one trick alone.