Defensive Publications Series

Voice Embedding-Driven Media Playback Control for Speaker-Specific Navigation

Abstract

A technique is proposed for audio-synchronized personal media playback in a streaming interface. An audio signal associated with a media item is obtained. Voice activity data reflecting one or more voice characteristics associated with at least one speaker of the media item is extracted from the audio signal. A voice embedding representing the one or more voice characteristics associated with the at least one speaker is generated. The voice embedding is associated with at least one embedding cluster for the media item. One or more speaker-based playback operations are performed with respect to the media item based on the at least one voice embedding cluster.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Gaikar, Kshitij Suresh and Jassal, Manisha, "Voice Embedding-Driven Media Playback Control for Speaker-Specific Navigation", Technical Disclosure Commons, (June 04, 2026)
https://www.tdcommons.org/dpubs_series/10353

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Voice Embedding-Driven Media Playback Control for Speaker-Specific Navigation

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Voice Embedding-Driven Media Playback Control for Speaker-Specific Navigation

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information