Moonshot

Store one canonical tensor and a family of cheap transformations:

  • permutations
  • sign flips
  • rescalings
  • low-rank transport maps

Then reconstruct many use-site tensors as transformed versions of the prototype.

Why this is outside the current prior

Most sharing work assumes exact tying or light additive adapters. This moonshot instead treats many unique tensors as possibly the same object seen through different coordinate systems.

Mechanism sketch

  • learn prototype matrices for a class of layers
  • store tiny transport parameters per use-site
  • decode actual tensors through prototype + transform
  • add sparse residual only where transport fails badly

Why it might matter for Parameter Golf

If many tensors differ mostly by cheap transforms, then storing them independently is wasteful. A prototype-plus-transport representation could compress repeated structure far more aggressively than exact tying without paying for fully unique layers.

Cheapest falsifier

  • fit one prototype and one small transport family to a repeated tensor class
  • compare final bytes versus explicit storage or strict tying

Kill it if transport params become nearly as expensive as direct storage.

What would make it real

  • prototype reuse across many layers or blocks
  • transport parameters remain tiny
  • residual path is sparse enough to justify the representation