Abstract

Speech recognition is widely used as a voice-user interface in several settings, e.g., interactive voice response, virtual personal assistants, transcription, translation applications, etc. Although speech recognition technology has advanced far enough to be useful to a sizeable number of human speakers, there are still populations that cannot take full advantage of speech recognition. For example, people with impaired speech, speakers of rare languages or dialects, with strong accents, etc. have difficulty using an application that uses speech recognition. The reason for such user difficulty is that there is insufficient data to train an automatic speech recognizer to recognize such relatively rare speech. This disclosure describes techniques for creating a large training set out of a small set of speech samples. Acoustic and linguistic features peculiar to a class of speakers are extracted out of a small set of their speech samples, with their consent and permission. These features presented as constraints to a speech synthesizer in order to generate a larger training set.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Kanevsky, Dimitri; Senior, Andrew; and Basson, Sara, "Generation of speech training data for special speech recognition tasks", Technical Disclosure Commons, (July 24, 2017)
https://www.tdcommons.org/dpubs_series/605

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Generation of speech training data for special speech recognition tasks

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Generation of speech training data for special speech recognition tasks

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information