This page is about the public challenge record and the visible research storyline around it. It is not a private experiment log; for the repository’s own much denser chronology, use local experiment history.

Why a history page matters

Parameter Golf moves fast, and the obvious ideas can look novel if they are rediscovered in isolation. A good history page helps separate:

  • what the baseline already taught everyone
  • what kinds of public runs or PRs have already made a direction legible
  • what still looks underexplored in the open record

Phase 0: the baseline clarified the real objective

The earliest public record made several things obvious:

  • the challenge is about the final artifact, not an uncompressed checkpoint
  • post-export degradation is large enough to matter
  • the budget is tight enough that bytes must be treated as a first-class design resource
  • even code bytes are worth tracking because the margin is not infinite

The baseline also established the challenge’s basic mental model: a respectable small transformer can be competitive, but only if its export path survives the roundtrip cleanly.

Phase 1: the field quickly shifted from “small dense model” to “bytes-aware model”

Once the rules were taken seriously, the center of gravity moved away from plain dense scaling and toward techniques like:

  • tied embeddings and output-side discipline
  • shared depth or recurrent blocks
  • compression-aware regularization
  • quantization-aware stabilization
  • export-side heuristics for preserving the most fragile weights

That shift is important historically because it marks the point where Parameter Golf stopped looking like standard small-model training and started looking like a specialized artifact-optimization problem.

Phase 2: recurrence and sharing became publicly central

Among publicly visible directions, parameter sharing emerged very quickly as a natural frontier.

Why:

  • the byte cap punishes fully unique depth
  • repeated structure can buy much more effective depth per stored byte
  • recurrence also creates a natural bridge to evaluation-time compute

This is why recursive and shared-parameter architectures became one of the core pages in the garden rather than a niche side path.

Phase 3: the score path itself became a research object

Another public shift was the realization that the export path is not just bookkeeping. It is part of the model.

That encouraged work on:

  • roundtrip-aware training objectives
  • normalization and scaling changes that reduce export fragility
  • outlier-aware or sensitivity-aware weight treatment
  • direct attempts to shrink the gap between floating-point quality and scored-artifact quality

This is the moment where quantization and outliers stopped being a deployment concern and became a core modeling concern.

What the public record seems to have covered already

The open challenge conversation has made at least these directions clearly legible:

  • dense baseline models with careful export
  • tied embeddings and vocabulary discipline
  • shared-depth / recurrent ideas
  • compression-aware regularization
  • basic evaluation-side tricks and scoring proxies

Anyone claiming novelty should assume those areas are already part of the public conversation unless the contribution is genuinely more specific.

What still appears relatively open in the public record

The following directions still look less saturated or less settled:

  • more radical artifact formats than straightforward dense low-bit exports
  • selective precision with especially clean metadata accounting
  • tokenizer redesign aimed directly at the challenge metric
  • deeper use of evaluation-time latent refinement, not just lightweight sidecars
  • cross-layer basis-sharing schemes that are explicitly optimized for stored bytes

That does not guarantee they are unexplored in private, only that they still feel comparatively open in the visible record.

What public runs have already taught, even without a final winner

A few durable lessons seem clear:

  1. Artifact-aware quality is the real battleground. A model that looks great before export can still lose the challenge.

  2. Parameter reuse is not a cute trick. It is one of the most natural ways to buy capability under the byte cap.

  3. Small stabilizing changes can matter a lot. Better scaling, normalization, or outlier handling may beat a flashier architecture change.

  4. The frontier is compositional. Winning ideas will likely combine sharing, quantization robustness, and budget-aware evaluation rather than rely on a single gimmick.

How to use this page

Use this page to keep the challenge narrative straight: