1 item with this tag.

  • papers

    ALBERT

    Paper note on cross-layer parameter sharing and factorized embeddings as two clean ways to reduce stored parameters without simply shrinking hidden capacity.