Abstract

Subtitles or closed captions for the audio in video content are an important accessibility and comprehension feature of video hosting platforms. Generating captions manually can be time consuming, expensive, and difficult to scale. Captions generated via automated speech recognition (ASR) can be noisy and inaccurate, and depend on the quality of the video. This disclosure describes the use of content fingerprinting techniques to detect near, partial, or exact duplicates of a newly added video in a large corpus. Captions from the matching content are then automatically transferred to the newly added video, rather than being generated from scratch. Alternatively, captions for all matching content are aggregated to generate a caption stream for the entire cluster. The described techniques can improve the quality and consistency of captions for video content, thus improving accessibility and understandability of the content.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Sharifi, Matthew, "Efficient Automated Video Captioning Based On Content Fingerprinting", Technical Disclosure Commons, (July 13, 2021)
https://www.tdcommons.org/dpubs_series/4442

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Efficient Automated Video Captioning Based On Content Fingerprinting

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Efficient Automated Video Captioning Based On Content Fingerprinting

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information