Subtitles or closed captions for the audio in video content are an important accessibility and comprehension feature of video hosting platforms. Generating captions manually can be time consuming, expensive, and difficult to scale. Captions generated via automated speech recognition (ASR) can be noisy and inaccurate, and depend on the quality of the video. This disclosure describes the use of content fingerprinting techniques to detect near, partial, or exact duplicates of a newly added video in a large corpus. Captions from the matching content are then automatically transferred to the newly added video, rather than being generated from scratch. Alternatively, captions for all matching content are aggregated to generate a caption stream for the entire cluster. The described techniques can improve the quality and consistency of captions for video content, thus improving accessibility and understandability of the content.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Sharifi, Matthew, "Efficient Automated Video Captioning Based On Content Fingerprinting", Technical Disclosure Commons, (July 13, 2021)