The audio signal transmitted by the far end of a conference call or video conference is typically received at the near end as a composite signal that is the sum of the audio signals of the far-end participants. Far-end participants are therefore not spatially separable at the near end. This prevents near-end participants from using the natural focusing abilities of the brain (cocktail party effect) to focus on the speech of particular far-end participants. This disclosure describes techniques, e.g., per-microphone audio channels, speech diarization, etc., that distinguish far-end participants such that their audio is spatially separated at the near end.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Zwiener, Jakob; Lance, Marcos Calvo; and Kriz, Jakub, "Spatially Separating Participant Audio in a Conference Call", Technical Disclosure Commons, (April 21, 2020)