Abstract
Computer control functionality refers to the capabilities of artificial intelligence (AI) agents that enable the agent to use a computer as a human does, e.g., by interacting with user interfaces via operations of input devices. For accurate task completion, it is important for the AI agent to reason about the task currently being performed and to determine what effects the actions it performs may have on the environment. This disclosure describes techniques that enable an AI agent to learn to estimate the reversibility of an action in a user interface (UI) and to use the reversibility estimate to adapt the level of reasoning before performing the action. Upon receiving a task, the agent performs a chain-of-thought procedure to predict an action and its reversibility. An action determined to be reversible is carried out; an action determined to be irreversible is subjected to refinement or critique to ensure that it is correct. The effect of a carried-out action is stored as a training example. Inaccurate actions that are irreversible receive a higher training penalty than accurate actions or inaccurate actions that are reversible.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Hartmann, Florian and Carbune, Victor, "Computer Control by Agentic Systems with World Models of User Interfaces", Technical Disclosure Commons, (September 29, 2025)
https://www.tdcommons.org/dpubs_series/8656