Defensive Publications Series

Latency Reduction in Conversational User Interfaces by Selective Execution of Multi-pass Speech Recognition Based on LLM Evaluation

Petar AleksicFollow
Lillian ZhouFollow

Abstract

Conversational interfaces enable users to interact with a virtual assistant, chatbot, or other software via spoken audio. In a cascaded conversational system architecture, an automatic speech recognition (ASR) model transcribes a user’s spoken query to text, a large language model (LLM) generates the text of a response, and a text-to-speech (TTS) model generates response audio from the response text. This configuration is subject to high latency due to the need to wait for the ASR model to generate a transcription before the LLM response can be calculated. For a high-quality, full-context transcription, the ASR model can take time. Per techniques of this disclosure, in addition to generating the response based on an initial transcription obtained at a first pass of the ASR model, the LLM is tasked with dynamically determining whether to use the initial transcription from the ASR model or to wait for a more accurate subsequent pass of the model. If the LLM determines that the initial transcription is irrelevant to the ongoing conversation or contains misrecognitions, no response is generated based on the initial transcription and instead, the LLM waits for the more accurate second pass transcription. Conversely, if the first-pass transcription is accurate and relevant, the second pass is skipped (or stopped, if already initiated).

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Aleksic, Petar and Zhou, Lillian, "Latency Reduction in Conversational User Interfaces by Selective Execution of Multi-pass Speech Recognition Based on LLM Evaluation", Technical Disclosure Commons, (June 27, 2025)
https://www.tdcommons.org/dpubs_series/8287

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Latency Reduction in Conversational User Interfaces by Selective Execution of Multi-pass Speech Recognition Based on LLM Evaluation

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Latency Reduction in Conversational User Interfaces by Selective Execution of Multi-pass Speech Recognition Based on LLM Evaluation

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information