D ShinFollow


Voice queries to a virtual assistant can be misinterpreted when such queries are issued in noisy environments, where a user’s speech (query) is mixed in with other background speech. Such query misinterpretation can result in the virtual assistant response being inaccurate and/or unexpected. This disclosure describes techniques that employ large language models (LLMs) to obtain accurate transcription of a user’s voice input to a virtual assistant when the virtual assistant is triggered with a manual action such as a button press. The techniques can obtain an accurate transcription of the user’s query even when the user’s voice input is mixed in with other speech occurring in the background while the user is speaking. Various speech signals in the input are split, each split portion of the signal is transcribed separately, and the transcriptions are evaluated to identify the speech signal that is most likely to be the user’s query. The start and end of manual trigger is utilized in the process to select the transcript associated with the voice input

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.