Abstract
A technique is proposed for audio-synchronized personal media playback in a streaming interface. An audio signal associated with a media item is obtained. Voice activity data reflecting one or more voice characteristics associated with at least one speaker of the media item is extracted from the audio signal. A voice embedding representing the one or more voice characteristics associated with the at least one speaker is generated. The voice embedding is associated with at least one embedding cluster for the media item. One or more speaker-based playback operations are performed with respect to the media item based on the at least one voice embedding cluster.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Gaikar, Kshitij Suresh and Jassal, Manisha, "Voice Embedding-Driven Media Playback Control for Speaker-Specific Navigation", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10353