Automatic speech recognizer software (ASR), e.g., as implemented in voice-activated virtual assistants or other applications, is prone to conflating words that sound similar (homophones). Keyboard-style corrections, e.g., based on the edit-distance of transcribed words, are suboptimal in the presence of such transcription errors. This disclosure describes techniques that predict the N-best speech-to-text transcription alternatives for a given word, wherein the suggested replacements are based on homophones or words with similar sounds. The techniques can be used in any context where automatic speech recognition is used, e.g., to enable correction of commands provided to a virtual assistant, to modify transcribed speech, etc.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.