This page groups the public record into submission archetypes.

Because the challenge is still young, only a small number of archetypes are actually public. So this page separates:

  • observed archetypes: directly visible in public run folders
  • expected archetypes: strongly implied by the rules, but not yet established by public run records

Observed archetype 1: the dense tied-embedding baseline

Represented by Naive Baseline.

Core traits:

  • straightforward dense transformer layout
  • small vocabulary (1024)
  • tied embeddings
  • moderate attention simplification (4 KV heads)
  • conventional training/evaluation flow
  • final score reported after int8 + zlib roundtrip

Why this archetype matters:

  • it gives the field a clean reproducible floor
  • it demonstrates what “legal and simple” looks like under the artifact cap
  • it exposes the output head and tokenizer as budget-relevant, not auxiliary

Observed archetype 2: the same artifact family with more training time

Represented by 4-Hour Quasi-10B SP1024.

Core traits:

  • same broad architecture family as the dense baseline
  • same artifact-cap discipline
  • relaxed training-time constraint
  • improved score without changing the public conceptual story much

Why this archetype matters:

  • it separates artifact design from training budget
  • it shows that extra compute helps but may not be the whole frontier
  • it makes the pre-quant vs post-roundtrip gap impossible to ignore

Expected archetype 1: shared-depth / recurrent submissions

This is the most obvious not-yet-public archetype suggested by the challenge framing.

If it appears, it will likely draw on:

Why it is plausible:

  • the byte cap rewards storing fewer unique weights
  • recurrence can trade bytes for compute
  • the challenge explicitly invites depth recurrence and parameter tying as directions

Why it is still uncertain:

  • there is not yet a public leaderboard-facing record folder here that demonstrates the full recipe working end to end

Expected archetype 2: compression-first or quantization-native submissions

Likely ingredients:

  • training designed around low-bit robustness
  • selective precision for fragile tensors
  • explicit outlier handling
  • architecture choices that improve post-roundtrip behavior rather than only floating-point quality

Most relevant links:

Expected archetype 3: tokenizer / output-head redesign submissions

Likely ingredients:

  • smaller or more specialized vocabularies
  • head-factorization or output-side compression ideas
  • tokenization choices optimized for the challenge metric rather than generic downstream use

Most relevant links:

Expected archetype 4: bounded evaluation-time compute submissions

Likely ingredients:

  • iterative refinement
  • cheap recurrent decoding passes
  • extra reasoning or adaptation steps that stay inside the allowed evaluation budget

Most relevant links:

Editorial rule for this section

Until more public run folders appear, the safest reading is:

  • the baseline archetypes are real
  • the more exotic archetypes are challenge-implied but not yet publicly demonstrated here