Defensive Publications Series

Automated Transactional Workflow Orchestration from Visual Documents

Abstract

Manual transcription of information from unstructured visual documents, such as utility bills or event flyers, for digital transactions may be inefficient and prone to error. A system is described for automating transactional workflows initiated from a visual input. A user may provide an image of a document to a multimodal large language model, which can analyze the image to perform semantic analysis, extract relevant transactional data like a payee and an amount, and infer a user's intent. This structured information can then be provided to an agentic controller that can programmatically orchestrate a sequence of actions. These actions can include navigating a payment website, populating data fields, integrating with a payment service for user confirmation, and performing post-transaction tasks such as saving a digital receipt, thereby potentially reducing manual intervention in completing the transaction.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Dayanand, Dinoop; Tejasvi, Ravi; Kota, Nithya; Chhatbar, Hemen; and Chiu, Adam, "Automated Transactional Workflow Orchestration from Visual Documents", Technical Disclosure Commons, (April 07, 2026)
https://www.tdcommons.org/dpubs_series/9731

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Automated Transactional Workflow Orchestration from Visual Documents

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Automated Transactional Workflow Orchestration from Visual Documents

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information