Abstract
Techniques are described for conditioning beam-search decoding on encoder outputs without constructing cross-attention matrices over encoder positions. An encoder output is compressed once into a fixed-size summary vector using learned query vectors. For each beam, a decoder query is mapped to a low-dimensional latent code by a code generator. A hypernetwork generates parameters for a per-beam primary network as a function of the latent code, including affine weight generation. The primary network is applied to the shared summary to produce a beam-specific output vector, with per-beam computation independent of encoder sequence length. A continuous latent code space supports interpolation, extrapolation, and centroid computation to synthesize additional beam hypotheses without additional decoder forward passes. Training may include distillation from a cross-attention model, end-to-end fine-tuning, and optional code diversity regularization.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Anonymous, "Hypernetwork Cross-Attention", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10640