Speech recognition is widely used as a voice-user interface in several settings, e.g., interactive voice response, virtual personal assistants, transcription, translation applications, etc. Although speech recognition technology has advanced far enough to be useful to a sizeable number of human speakers, there are still populations that cannot take full advantage of speech recognition. For example, people with impaired speech, speakers of rare languages or dialects, with strong accents, etc. have difficulty using an application that uses speech recognition. The reason for such user difficulty is that there is insufficient data to train an automatic speech recognizer to recognize such relatively rare speech. This disclosure describes techniques for creating a large training set out of a small set of speech samples. Acoustic and linguistic features peculiar to a class of speakers are extracted out of a small set of their speech samples, with their consent and permission. These features presented as constraints to a speech synthesizer in order to generate a larger training set.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.