Inventor(s)

Abstract

Techniques are described for conditioning beam-search decoding on encoder outputs without constructing cross-attention matrices over encoder positions. An encoder output is compressed once into a fixed-size summary vector using learned query vectors. For each beam, a decoder query is mapped to a low-dimensional latent code by a code generator. A hypernetwork generates parameters for a per-beam primary network as a function of the latent code, including affine weight generation. The primary network is applied to the shared summary to produce a beam-specific output vector, with per-beam computation independent of encoder sequence length. A continuous latent code space supports interpolation, extrapolation, and centroid computation to synthesize additional beam hypotheses without additional decoder forward passes. Training may include distillation from a cross-attention model, end-to-end fine-tuning, and optional code diversity regularization.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS