Abstract

This disclosure describes techniques that, with user permission, use text data entered by a user to improve automatic speech recognition. A pre-trained language model and a personalization plan are applied to text entered by the user to build a personalized language model. Using user-entered data for personalization advantageously personalizes the dictation experience even for users who seldom use dictation. Using shallow fusion, a personalized language model trained on user-permitted data is combined with an automatic speech recognition (ASR) model. The combination can provide recognition performance superior to that of the component models. Fusion with language models trained through federated learning, as described herein, can improve dictation quality without requiring access to large amounts of transcribed dictation data.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS