Abstract

Systems for extracting specific data, such as a transaction total from a web page, may face challenges with accuracy and scalability when using heuristic-based methods like regular expressions. A described technique may utilize a client-server system where a client application on a computing device (e.g., a smartphone, smart watch, or laptop) can generate a compact, text-based representation of a web page’s rendered content. This representation can be transmitted to a remote server where a large language model, potentially guided by an engineered prompt, can analyze the content to semantically identify and extract desired data, such as a checkout amount. The system can return this data in a structured format to the client. This approach may improve data extraction accuracy and scalability across various websites by using contextual understanding rather than more rigid, site-specific rules, potentially reducing the maintenance burden associated with some rule-based systems.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS