Defensive Publications Series

Utterance Augmentation for Speaker Recognition

Jin ShiFollow
Quan WangFollow
Yeming FangFollow
Gang FengFollow
Zhengying ChenFollow
Jason PelecanosFollow
Ignacio Lopez MorenoFollow
Andrea ChuFollow
Pedro Moreno MengibarFollow

Abstract

The speaker recognition problem is to automatically recognize a person from their voice. The training of a speaker recognition model typically requires a very large training corpus, e.g., multiple voice samples from a very large number of individuals. In the diverse domains of application of speaker recognition, it is often impractical to obtain a training corpus of the requisite size. This disclosure describes techniques that augment utterances, e.g., by cutting, splitting, shuffling, etc., such that the need for collections of raw voice samples from individuals is substantially reduced. In effect, the original model works better on the augmented utterances on the target domain.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shi, Jin; Wang, Quan; Fang, Yeming; Feng, Gang; Chen, Zhengying; Pelecanos, Jason; Moreno, Ignacio Lopez; Chu, Andrea; and Moreno Mengibar, Pedro, "Utterance Augmentation for Speaker Recognition", Technical Disclosure Commons, (May 18, 2020)
https://www.tdcommons.org/dpubs_series/3238

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Utterance Augmentation for Speaker Recognition

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Utterance Augmentation for Speaker Recognition

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information