Abstract
While voice input has become a popular way of interacting with devices, user frustration due to incorrect transcription is common. Speech-to-text (STT) conversion errors can require users to provide the spoken input again, manually issue a correction command, or use a non-voice modality to make corrections. This disclosure describes techniques to automatically play audio cues to indicate when the confidence in the accuracy of speech transcription is low. The cues enable timely, inline correction of the transcript as the user speaks, in a manner akin to human conversation. The cues can include a discernible tone/ beep or spoken phrases that indicate that particular spoken phrases were not transcribed with sufficient confidence.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Ling, Gaetano; Schladow, Amelia; Colville, Michael; Sibigtroth, Matthew; Rickerby, George Joseph; and Pawle, Benjamin Guy Alexander, "Enabling Inline Correction of Speech Transcript via Audio Cues", Technical Disclosure Commons, (April 05, 2023)
https://www.tdcommons.org/dpubs_series/5782