Abstract
A context gap is often experienced when large language models (LLMs) are utilized on secondary computing devices during media playback on primary computing devices. Manual context input is typically used to formulate queries regarding the on-screen content. The disclosed technology provides a synchronization process where a media playback state and metadata are transmitted from the primary computing device to the secondary computing device over a local network. A silent handshake protocol is executed to retrieve a timestamp and a content identifier for a scene. This playback data is utilized to fetch scene-specific metadata, such as actor identification and visual details. The retrieved scene-specific metadata is injected into a context window of a large language model hosted on the secondary computing device. Consequently, real-time natural language queries regarding the media content as it is playing on the primary computing device are supported. Accurate responses to the queries are generated without any manual context provisioning.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Agarwal, Nikita, "Media Playback State Synchronization for Large Language Models", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10384