Abstract

Automatic speech recognition (ASR) machine learning models are used to recognize spoken commands or queries from users. End-to-end ASR models, which directly map a sequence of input acoustic features into a sequence of words, greatly simplify ASR system building and maintenance. This disclosure describes techniques to improve the performance of end-to-end ASR models by providing predicted user intents as additional inputs. Intent prediction vectors or intent embedding is generated based on user-permitted contextual features using a trained intent prediction network (IPN). The IPN can be trained independently from the ASR model or jointly with the ASR model. Training of the IPN can be performed based on training data that includes user-permitted contextual features, even when such data does not include speech data. The IPN can be retrained when the available contextual feature set changes.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Caseiro, Diamantino; Wu, Zelin; Aleksic, Petar; and Jain, Era, "Intent Prediction Based On Contextual Factors For Better Automatic Speech Recognition", Technical Disclosure Commons, (March 26, 2021)
https://www.tdcommons.org/dpubs_series/4197

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Intent Prediction Based On Contextual Factors For Better Automatic Speech Recognition

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Intent Prediction Based On Contextual Factors For Better Automatic Speech Recognition

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information