Abstract
Voice communication can be difficult for those with impaired or accented speech. When such users communicate with others via applications on their devices, listeners often find it difficult to understand them. This disclosure describes techniques that dynamically process impaired or accented speech and convert it to synthesized canonical speech with permission. Generation of the synthesized speech is performed with low latency as a user speaks, enabling the parties to engage in smooth communication that is unaffected by the speaker’s speech impairment. The listeners receive clear, fluent speech automatically generated by suitably trained models. In addition, users can personalize the operation based on their specific speech impairments. The techniques can be integrated within any messaging, conferencing, or phone calling/ dialer application on any device and can make the applications more accessible to users with impaired speech and enhance the user experience.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Feng, Gang; Zhang, Xia; Mengibar, Pedro Moreno; Biadsy, Fadi; Jiang, Liyang; Rybakov, Oleg; Wu, Yuexin; and Chen, Joseph, "Automated Conversion of Impaired Speech in Communication Applications", Technical Disclosure Commons, (March 07, 2023)
https://www.tdcommons.org/dpubs_series/5719