Abstract

Image tokenization is a technique that divides an image into multiple patches and embeds each patch into a vector space. Image tokenization is important for large language models (LLMs) to effectively answer queries relating to an image. A limitation of current image tokenization techniques for screenshots is that the patches are chosen in a manner that does not take into account user interface semantics, resulting in low information efficiency of tokens and user interface (UI) elements being split across tokens. This disclosure describes techniques that leverage UI element trees to guide screenshot tokenization, leading to higher quality screenshot tokens and superior screenshot-based LLM inference. Given a screenshot and a corresponding UI element tree, screenshot tokenization is performed by recursively traversing the UI element tree, finding children (subtrees) of the tree with size under a threshold, allocating an image token for each subtree under the threshold, generating a screenshot for each subtree with size under the threshold, and transforming the screenshot into an embedding. The tokenized output can be used by a computer control agent or a virtual assistant to perform a task with reference to the user interface that the screenshot captures.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Hartmann, Florian and Tran, Duc-Hieu, "Screenshot Tokenization Guided by User Interface Tree", Technical Disclosure Commons, (February 11, 2026)
https://www.tdcommons.org/dpubs_series/9321

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Screenshot Tokenization Guided by User Interface Tree

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Screenshot Tokenization Guided by User Interface Tree

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information