Abstract

Voice queries to a virtual assistant can be misinterpreted when such queries are issued in noisy environments, where a user’s speech (query) is mixed in with other background speech. Such query misinterpretation can result in the virtual assistant response being inaccurate and/or unexpected. This disclosure describes techniques that employ large language models (LLMs) to obtain accurate transcription of a user’s voice input to a virtual assistant when the virtual assistant is triggered with a manual action such as a button press. The techniques can obtain an accurate transcription of the user’s query even when the user’s voice input is mixed in with other speech occurring in the background while the user is speaking. Various speech signals in the input are split, each split portion of the signal is transcribed separately, and the transcriptions are evaluated to identify the speech signal that is most likely to be the user’s query. The start and end of manual trigger is utilized in the process to select the transcript associated with the voice input

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D, "Speech Source Separation and Interpretation Based on Manual Trigger for Query Input", Technical Disclosure Commons, (May 22, 2024)
https://www.tdcommons.org/dpubs_series/7037

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Speech Source Separation and Interpretation Based on Manual Trigger for Query Input

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Speech Source Separation and Interpretation Based on Manual Trigger for Query Input

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information