Submission Archetypes

This page groups the public record into submission archetypes.

Because the challenge is still young, only a small number of archetypes are actually public. So this page separates:

observed archetypes: directly visible in public run folders
expected archetypes: strongly implied by the rules, but not yet established by public run records

Observed archetype 1: the dense tied-embedding baseline

Represented by Naive Baseline.

Core traits:

straightforward dense transformer layout
small vocabulary (1024)
tied embeddings
moderate attention simplification (4 KV heads)
conventional training/evaluation flow
final score reported after int8 + zlib roundtrip

Why this archetype matters:

it gives the field a clean reproducible floor
it demonstrates what “legal and simple” looks like under the artifact cap
it exposes the output head and tokenizer as budget-relevant, not auxiliary

Observed archetype 2: the same artifact family with more training time

Represented by 4-Hour Quasi-10B SP1024.

Core traits:

same broad architecture family as the dense baseline
same artifact-cap discipline
relaxed training-time constraint
improved score without changing the public conceptual story much

Why this archetype matters:

it separates artifact design from training budget
it shows that extra compute helps but may not be the whole frontier
it makes the pre-quant vs post-roundtrip gap impossible to ignore

Expected archetype 1: shared-depth / recurrent submissions

This is the most obvious not-yet-public archetype suggested by the challenge framing.

If it appears, it will likely draw on:

Why it is plausible:

the byte cap rewards storing fewer unique weights
recurrence can trade bytes for compute
the challenge explicitly invites depth recurrence and parameter tying as directions

Why it is still uncertain:

there is not yet a public leaderboard-facing record folder here that demonstrates the full recipe working end to end

Expected archetype 2: compression-first or quantization-native submissions

Likely ingredients:

training designed around low-bit robustness
selective precision for fragile tensors
explicit outlier handling
architecture choices that improve post-roundtrip behavior rather than only floating-point quality

Most relevant links:

Expected archetype 3: tokenizer / output-head redesign submissions

Likely ingredients:

smaller or more specialized vocabularies
head-factorization or output-side compression ideas
tokenization choices optimized for the challenge metric rather than generic downstream use

Most relevant links:

Expected archetype 4: bounded evaluation-time compute submissions

Likely ingredients:

iterative refinement
cheap recurrent decoding passes
extra reasoning or adaptation steps that stay inside the allowed evaluation budget

Most relevant links:

Editorial rule for this section

Until more public run folders appear, the safest reading is:

the baseline archetypes are real
the more exotic archetypes are challenge-implied but not yet publicly demonstrated here

Parameter Golf Research Garden

Section Tree

Submission Archetypes

Observed archetype 1: the dense tied-embedding baseline

Observed archetype 2: the same artifact family with more training time

Expected archetype 1: shared-depth / recurrent submissions

Expected archetype 2: compression-first or quantization-native submissions

Expected archetype 3: tokenizer / output-head redesign submissions

Expected archetype 4: bounded evaluation-time compute submissions

Editorial rule for this section

Graph View

Table of Contents

Referenced by

Recent notes

Public Runs

History and Public Runs

Public Research Directions

Paper Index

The LM Head is a Gradient Bottleneck

Mamba-PTQ

Titans

Transformers are SSMs

Section Tree

Submission Archetypes

Observed archetype 1: the dense tied-embedding baseline

Observed archetype 2: the same artifact family with more training time

Expected archetype 1: shared-depth / recurrent submissions

Expected archetype 2: compression-first or quantization-native submissions

Expected archetype 3: tokenizer / output-head redesign submissions

Expected archetype 4: bounded evaluation-time compute submissions

Editorial rule for this section

Related

Graph View

Table of Contents

Referenced by

Recent notes