Automatic speech recognizers (ASR) typically treat each utterance of a conversation independently. This often leads to errors such as the incorrect transcription of homophones. These errors cascade into further problems when performing natural language understanding. This disclosure presents speech recognition techniques that transcribe speech using the larger context of the dialog. Per the techniques, individual utterances are transcribed based on the context of the conversation. The techniques distinguish homophones by context and improve in-dialog ASR without relying on supervised data or manually-provided phrases. The techniques generalize well to unseen dialogs or queries.

This work is licensed under a Creative Commons Attribution 4.0 License.