Defensive Publications Series

UNIFIED SPEECH ENCODING AND STREAMING KEY-VALUE CACHING FOR EFFICIENT SPEECH LLM-BASED MEETING INTELLIGENCE

Gregoire Magendie
Erwan Zerhouni

Abstract

Proposed herein is a unified, graphics processing unit (GPU) -efficient architecture for real-time and post-meeting speech intelligence that simultaneously handles transcription, translation, summarization, and action-item extraction without redundant computation. By tightly coupling a shared audio encoder, streaming multimodal caching, blank frame skipping, and persistent storage, the system drastically reduces latency and cost, enabling scalable, low-latency meeting analysis across multiple tasks, even for long-form audio sessions.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Magendie, Gregoire and Zerhouni, Erwan, "UNIFIED SPEECH ENCODING AND STREAMING KEY-VALUE CACHING FOR EFFICIENT SPEECH LLM-BASED MEETING INTELLIGENCE", Technical Disclosure Commons, (December 11, 2025)
https://www.tdcommons.org/dpubs_series/9009

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

UNIFIED SPEECH ENCODING AND STREAMING KEY-VALUE CACHING FOR EFFICIENT SPEECH LLM-BASED MEETING INTELLIGENCE

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

UNIFIED SPEECH ENCODING AND STREAMING KEY-VALUE CACHING FOR EFFICIENT SPEECH LLM-BASED MEETING INTELLIGENCE

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information