While voice input has become a popular way of interacting with devices, user frustration due to incorrect transcription is common. Speech-to-text (STT) conversion errors can require users to provide the spoken input again, manually issue a correction command, or use a non-voice modality to make corrections. This disclosure describes techniques to automatically play audio cues to indicate when the confidence in the accuracy of speech transcription is low. The cues enable timely, inline correction of the transcript as the user speaks, in a manner akin to human conversation. The cues can include a discernible tone/ beep or spoken phrases that indicate that particular spoken phrases were not transcribed with sufficient confidence.

This work is licensed under a Creative Commons Attribution 4.0 License.