Abstract
A media item and a transcript of the media item is received. A plurality of media tokens is generated based on the media item. A plurality of transcript item tokens is generated based on the transcript. A subset of the plurality of media tokens and a subset of the plurality of transcript tokens is inputted on a rolling basis into a transformer model. A sequence of probability distribution for the subset of the plurality of transcript token is obtained from the transformer model. With each submission, a transcript token is selected from the subset of the plurality of transcript tokens having the highest probability. A predetermined number of selected transcript tokens are combined with the transcript based on times associated with the tokens.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Shin, Dongeek, "Times Synchronization of Media and Transcript", Technical Disclosure Commons, (November 11, 2024)
https://www.tdcommons.org/dpubs_series/7518