Abstract

The ability of intelligent agents to perform complex, cross-application tasks may be constrained by a dependency on application-specific programming interfaces (APIs), which can limit interaction with user interfaces and adaptation to dynamic visual changes. A system can employ a closed-loop perception-action cycle where a multimodal intelligent agent analyzes visual data from a screen of a host device, such as a smartphone or computer, to predict a next action. A control application can translate this action into a sequence of low-level human interface device events. A peripheral hardware actuator device may receive these events and inject them into the host device's operating system by emulating a standard input peripheral, for example, a keyboard or a pointing device. This approach can provide an application-agnostic method for control, enabling an agent to perform visually-grounded, multi-step tasks across a graphical user interface without a dependency on specific APIs or software integrations.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Razdan, Shashwat, "System for AI-Driven UI Control via a Hardware Actuator Emulating a Human Interface Device", Technical Disclosure Commons, (January 27, 2026)
https://www.tdcommons.org/dpubs_series/9224

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

System for AI-Driven UI Control via a Hardware Actuator Emulating a Human Interface Device

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

System for AI-Driven UI Control via a Hardware Actuator Emulating a Human Interface Device

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information