Inventor(s)

HP INCFollow

Abstract

This disclosure aims to allow the user the possibility to extract the template of a document using only a picture of it. Our method can be described as follows: 1. The document image is identified in the scene (when mobile scanning, for instance); 2. The document image is cropped from the scene; 3. The document image is segmented into its different regions: e.g., title, image, graphic, content text; 4. For each region, an algorithm (e.g., a ML model) will be used to extract the region template features and an OCR engine will extract the text from the region; 5. The template features will be processed by another algorithm capable of a. matching those “unknown” features to known ones, or b. return a template code format (e.g., Latex format); 6. The template regions and extracted texts are joined in a customizable software (e.g., Microsoft Word or TeXstudio), so the user can modify it.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.

Share

COinS