Abstract
Traditional AI-generated responses in sales calls often face significant delays, typically 4+ seconds. This includes 700ms for Speech-to-Text, 2 seconds for AI response generation, and 400ms for Text-to-Speech. Using Retrieval Augmented Generation can extend this to 5-7 seconds, leading to customer dissatisfaction. To address this, we've introduced technical solutions to reduce delays. Using GPT-4 streaming mode and sentence-level TTS can cut response time by about 1 second. Concurrent matching with existing responses can further reduce time. If a match is found, a pre-recorded voice response is delivered immediately. If not, transitional words buy time for GPT-4 to generate a response, allowing for a 1-second response time without TTS.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Opor, Liy, "Reduce delays in AI-powered Calls", Technical Disclosure Commons, (January 13, 2025)
https://www.tdcommons.org/dpubs_series/7724