Abstract

While speech input, detected by on-device microphones, enables users to provide voice commands or other text at a higher speed than typing, the accuracy of transcription can suffer in certain environments, e.g., in the presence of background noise. Gesture input via a virtual keyboard is convenient but slower and also prone to errors, e.g., when a gesture resolves to multiple words. Per techniques of this disclosure, a user can provide spoken and gesture input at the same time to a user device. Each input is converted to text separately. The stereo text stream is time aligned using word-to-word distance to obtain time-aligned query text. The time-aligned query text is semantically filtered to generate text conversion of the user input. The techniques can be implemented on any user device and used to process user input such as queries provided to a virtual assistant or other applications on the user device. The fusion of the voice and gesture modes can increase the accuracy, speed, and reliability of providing text input.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D, "Fusing Voice and Gesture Input to Improve Text Transcription Accuracy", Technical Disclosure Commons, (December 19, 2023)
https://www.tdcommons.org/dpubs_series/6513

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Fusing Voice and Gesture Input to Improve Text Transcription Accuracy

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Fusing Voice and Gesture Input to Improve Text Transcription Accuracy

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information