Abstract

Automatic speech recognition (ASR) machine learning models are deployed on client devices that include speech interfaces. ASR models can benefit from continuous learning and adaptation to large-scale changes, e.g., as new words are added to the vocabulary. While federated learning can be utilized to enable continuous learning for ASR models in a privacy preserving manner, the trained model can perform poorly on rarely occurring, long-tail words if the distribution of data used to train the model is skewed and does not adequately represent long-tail words. This disclosure describes federated learning techniques to improve ASR model quality when interpreting long-tail words given an imbalanced data distribution. Two different approaches - probabilistic sampling and client loss weighting - are described herein. In probabilistic sampling, the federated clients that include fewer long-tail words are less likely to be selected during training. In client loss weighting, incorrect predictions on long-tail words are more heavily penalized than for other words.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS