This page groups the public record into submission archetypes.
Because the challenge is still young, only a small number of archetypes are actually public. So this page separates:
- observed archetypes: directly visible in public run folders
- expected archetypes: strongly implied by the rules, but not yet established by public run records
Observed archetype 1: the dense tied-embedding baseline
Represented by Naive Baseline.
Core traits:
- straightforward dense transformer layout
- small vocabulary (
1024) - tied embeddings
- moderate attention simplification (
4KV heads) - conventional training/evaluation flow
- final score reported after
int8 + zlibroundtrip
Why this archetype matters:
- it gives the field a clean reproducible floor
- it demonstrates what “legal and simple” looks like under the artifact cap
- it exposes the output head and tokenizer as budget-relevant, not auxiliary
Observed archetype 2: the same artifact family with more training time
Represented by 4-Hour Quasi-10B SP1024.
Core traits:
- same broad architecture family as the dense baseline
- same artifact-cap discipline
- relaxed training-time constraint
- improved score without changing the public conceptual story much
Why this archetype matters:
- it separates artifact design from training budget
- it shows that extra compute helps but may not be the whole frontier
- it makes the pre-quant vs post-roundtrip gap impossible to ignore
Expected archetype 1: shared-depth / recurrent submissions
This is the most obvious not-yet-public archetype suggested by the challenge framing.
If it appears, it will likely draw on:
- recursive and shared-parameter architectures
- Relaxed Recursive Transformers
- Fine-grained Parameter Sharing
- MoEUT
Why it is plausible:
- the byte cap rewards storing fewer unique weights
- recurrence can trade bytes for compute
- the challenge explicitly invites depth recurrence and parameter tying as directions
Why it is still uncertain:
- there is not yet a public leaderboard-facing record folder here that demonstrates the full recipe working end to end
Expected archetype 2: compression-first or quantization-native submissions
Likely ingredients:
- training designed around low-bit robustness
- selective precision for fragile tensors
- explicit outlier handling
- architecture choices that improve post-roundtrip behavior rather than only floating-point quality
Most relevant links:
Expected archetype 3: tokenizer / output-head redesign submissions
Likely ingredients:
- smaller or more specialized vocabularies
- head-factorization or output-side compression ideas
- tokenization choices optimized for the challenge metric rather than generic downstream use
Most relevant links:
- tokenizer and vocabulary efficiency
- The LM head is part of the compression problem
- ReTok
- Vocabulary Compression
Expected archetype 4: bounded evaluation-time compute submissions
Likely ingredients:
- iterative refinement
- cheap recurrent decoding passes
- extra reasoning or adaptation steps that stay inside the allowed evaluation budget
Most relevant links:
- evaluation-time compute and inference scaling
- Inference Scaling Laws
- Iterative refinement over stored depth
Editorial rule for this section
Until more public run folders appear, the safest reading is:
- the baseline archetypes are real
- the more exotic archetypes are challenge-implied but not yet publicly demonstrated here