Abstract

Many voice-based assistive technologies transmit the voice input received from users to a server for processing. The transmitted audio includes the speaker’s voice which can identify the person. Users of such technologies therefore face a tradeoff between convenient voice interfaces with reduced privacy or less convenient non-voice input with higher privacy. Techniques described herein mask a user’s voice by locally processing the voice input received by a device. The masked voice cannot personally identify the user while still enabling server-side processing that allows recognition of spoken phrases. Application of the proposed techniques provides the user with greater privacy without diminishing the user experience for voice input in terms of recognition, latency, and other operational characteristics.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS