Abstract
A television (TV) application may present various types of media content of interest to a user. The media content may have different formats such as streaming video and audio. The types of media content may include, but are not limited to, movies, television shows, live sporting events, news items, short form videos, and music. In addition, or in the alternative, a variety of media content providers may deliver various types of media content for viewing by the user. The TV application may deliver a customized viewing experience to a user that spans the diverse types of media content provided by the variety of media content providers.
A limitation with current media consumption may involve difficulties in quickly grasping or recalling movie plot points, especially when joining the playing of a media content item that is in progress or when continuing to watch a media content item after taking a long break. In addition, though a recap summary may be generated, the language that the summary may be generated in may be limited to studio supported languages. The disclosed technology addresses these limitations by generating a localized recap summary of a media content item in any language. The disclosed technology generates the recap summary in any language by utilizing generative artificial intelligence (Gen AI) to process subtitles for the media content item, enabling dynamic generation of the summary up to the user's current viewing point. The purpose of the disclosed technology is to offer on-demand, language-agnostic plot comprehension without disrupting the viewing flow for the media content item.
Media consumption experiences of a user may also lack effective mechanisms for the user to gain context or detailed insights into ongoing media content, particularly when joining the viewing of a media content item that is plating or when seeking specific information about the media content item that is playing without interrupting the viewing experience of other users. The disclosed technology addresses the lack of effective mechanisms by using a generative artificial intelligence (Gen AI) model to provide real-time, context-aware insights for the media content item that are based on available subtitle data for the media content item. For example, a computing device of a user (e.g., a smartphone) may interact with interact with a media playback system (e.g., a television application running on a television) to transmit user queries and corresponding media timestamps to a server computer interfaced with the media playback system. The server computer may process the queries using the Gen AI model with the subtitle data for the media content item providing contextual input to the model. The disclosed technology can provide an interactive question-and-answer interface to the user that can provide summaries, character information, and/or explanations of complex scenes in the media content item that are accessible on a companion computing device of the user or that are displayed on a display device of the media playback system. The purpose of the disclosed technology is to enhance a user’s understanding and engagement with a media content item while viewing the media content item.
The technology for providing media content for viewing by a user may lack the ability to leverage on-screen visual and audio context for responsive interaction of the user with the media content item while the user is viewing the media content item. The disclosed technology may extract information, such as image data or audio data, directly from a connected computing device that includes a display device or a speaker. A television application running on the connected computing device may provide the extracted information as contextual information in the form of a query to a generative artificial intelligence (GenAI) model. The television application using the Gen AI model may provide a context-aware interactive experience to the user that enables the user to pose questions related to content actively being displayed or played by the television application on the connected computing device, thereby offering more relevant and integrated responses to the user.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Verma, Akash; Kumar, Vineet; Gupta, Abhay Kumar; Yadav, Vivek; Lu, Szu-An; Tsai, Meng-Ting; Liu, Ya Hsien; and Lien, Chu-Feng, "Gain Movie Insights Using Generative Artificial Intelligence", Technical Disclosure Commons, (July 23, 2025)
https://www.tdcommons.org/dpubs_series/8384