In conference calls audio/video streams from different sources are received with varying latency. Jitter buffers are used at each receiver to align packets in time. A jitter buffer is long enough to ensure that audio is available to send to the output device, accounting for the variation in packet delay. The jitter buffer is constrained by latency requirements to be short. Jitter buffers can however still delay signals by hundreds of milliseconds, thus reducing audio conferencing quality, e.g., by causing participants to speak simultaneously.
Techniques of this disclosure provide participants with a look-ahead indication of which participant is about to speak. The look-ahead indicator is activated prior to corresponding audio actually reaching the loudspeaker or headset. The indicator, generated only when participants permit analysis of call audio for such purpose, is based on cues in the audio or video stream of the jitter buffers of conference participants and serves to intimate other participants of a participant that is about to start speaking. The indicator can be conveyed in any mode, including audio, visual, and tactile.
Walter, Oliver; Lindstrom, Fredric; and Creusen, Ivo, "Prediction of incipient speech in audio/video conferences", Technical Disclosure Commons, (October 27, 2017)