Abstract

Machine generated speech transcriptions are a feature of several products such as videoconferencing software, mobile operating systems, etc. However, automatic transcribers are poor at accurately understanding some types of real world user speech. Spoken terms that are phonetically similar but have different meanings can cause errors in machine generated transcription. Although automatic transcribers evaluate various probable phrases as the spoken phrase, the analysis of sound alone is not enough to accurately recognize speech.

Per the techniques of this disclosure, a machine transcription model evaluates probable options for spoken language and evaluates the options based in part on using user-permitted available visual context. Such visual content is analyzed to determine presence of text within the image. If text is detected, OCR techniques are applied to recognize the text and the recognized text is used to improve the accuracy of transcription.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

N/A, "Using Visual Context to Improve Accuracy of Automated Speech Transcription", Technical Disclosure Commons, (September 21, 2020)
https://www.tdcommons.org/dpubs_series/3617

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Using Visual Context to Improve Accuracy of Automated Speech Transcription

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Using Visual Context to Improve Accuracy of Automated Speech Transcription

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information