The speaker recognition problem is to automatically recognize a person from their voice. The training of a speaker recognition model typically requires a very large training corpus, e.g., multiple voice samples from a very large number of individuals. In the diverse domains of application of speaker recognition, it is often impractical to obtain a training corpus of the requisite size. This disclosure describes techniques that augment utterances, e.g., by cutting, splitting, shuffling, etc., such that the need for collections of raw voice samples from individuals is substantially reduced. In effect, the original model works better on the augmented utterances on the target domain.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Shi, Jin; Wang, Quan; Fang, Yeming; Feng, Gang; Chen, Zhengying; Pelecanos, Jason; Moreno, Ignacio Lopez; Chu, Andrea; and Moreno Mengibar, Pedro, "Utterance Augmentation for Speaker Recognition", Technical Disclosure Commons, (May 18, 2020)