Speech interfaces provide a natural and accessible mechanism allowing users that cannot read to interact with computing devices to perform information-centric tasks. To verify that a spoken query has been understood correctly, devices such as smart displays, smartphones, etc. show users a streaming transcription of the spoken query, obtained using a speech recognition engine. However, text transcription based input verification is not usable by those who cannot read. The techniques described in this disclosure generate and provide an audiovisual transcription of the user’s speech input during and/or after the input, enabling users to verify that the speech input was understood correctly.

This work is licensed under a Creative Commons Attribution 4.0 License.