Defensive Publications Series

Expressive Captions with Sound Direction and Speaker Localization for Enhanced Accessibility and Immersion

Alex OlwalFollow
Artem DementyevFollow

Abstract

Traditional subtitles lack the ability to convey the spatial origin of sounds, making it difficult to distinguish between speakers or follow complex audiovisual scenes. This limitation impacts comprehension, immersion, and accessibility, particularly in multi-speaker or action-heavy contexts. This disclosure describes techniques to enhance video content via expressive captions that integrate directional sound localization and speaker differentiation. Using microphone arrays or inferred localization data, the approach identifies the spatial origin of sound sources in real time. Speaker diarization distinguishes speakers based on direction. Subtitles are extended with spatial metadata to dynamically link text to positions on the screen. Expressive captions incorporate visual elements such as directional arrows, color codes, and augmented reality overlays to effectively convey sound direction and speaker identity. This approach improves comprehension, enhances immersion, and increases accessibility, providing a richer and more intuitive viewing experience across video streaming platforms, conferencing tools, and accessibility services.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Olwal, Alex and Dementyev, Artem, "Expressive Captions with Sound Direction and Speaker Localization for Enhanced Accessibility and Immersion", Technical Disclosure Commons, (June 30, 2025)
https://www.tdcommons.org/dpubs_series/8293

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Expressive Captions with Sound Direction and Speaker Localization for Enhanced Accessibility and Immersion

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Expressive Captions with Sound Direction and Speaker Localization for Enhanced Accessibility and Immersion

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information