Abstract
Users interacting with media playback systems, such as smart televisions, often navigate complex, nested on-screen menus to adjust device settings such as display brightness, audio quality, or network configurations. This manual navigation process can be cumbersome and unintuitive, creating a barrier to efficient device control and configuration. A user may clearly understand a desired outcome or be able to describe a current issue with the media system but may lack the specific technical knowledge to locate and modify the exact system settings required to resolve the issue.
The disclosed technology describes an agent-based artificial intelligence framework that enables natural language control and configuration of a media system. An on-device assistant component receives a natural language query from a user and gathers current device state data alongside a directory of available system application functions. This contextual data and the user query are transmitted to a cloud-based server hosting a device control AI agent. The server-side agent utilizes a large language model to reason about the request, analyze the current device context, and select the appropriate application function or sequence of application functions to invoke from the provided directory.
After determining the optimal execution plan, the server-side agent returns the target application functions and generated response text to the on-device assistant. The media system then directly executes the specified application functions through an application function manager within the operating system framework to modify the target settings. The system subsequently provides visual and audio feedback to the user confirming the automated adjustment. The described architecture facilitates conversational device configuration, translating ambiguous user intents into definitive system-level actions while maintaining a standardized integration methodology across the device operating system.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Xu, Bin; Chen, Chi; Bailey, Guillaume; Colta, Paul; Chen, Hongguang; Isbiliroglu, Mehmet Hakan; Ding, Kaile; Keeley, Michael; Kusch, Ryan; Kalukiewicz, Roman; Klein, Wolfram; du Plooy, Hugo; Soland, Michael; Fang, Quxiang; Huang, Xurui; Tang, Weiping; Huang, Zhiji; and Tsang, Juliana, "Multi-Agent Architecture for Diagnostics and Configuration of Media Systems", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10522