Defensive Publications Series

Identifying Conversation Segments of an Audiovisual Stream

Abstract

This disclosure describes lightweight techniques to determine segments of an audio or audiovisual stream that include conversations. Per the techniques, a voice activity detector (VAD) isolates super-segments, e.g., segments of relatively long duration. Super-segments are split into smaller segments and mapped into an embeddings space. The embeddings obtained from the smaller segments are clustered. A chunk of video is determined to include a conversation if its length exceeds a certain threshold T and the number of segments it includes exceeds a certain threshold S; it has at least a certain number N of major clusters; and at least a certain number M of major clusters re-occur at least a certain K number of times.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Chen, Tongzhou and Audhkhasi, Kartik, "Identifying Conversation Segments of an Audiovisual Stream", Technical Disclosure Commons, (August 25, 2025)
https://www.tdcommons.org/dpubs_series/8502

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Identifying Conversation Segments of an Audiovisual Stream

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Identifying Conversation Segments of an Audiovisual Stream

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information