Structured content such as figures, tables, graphs, captions, and other graphical material often capture the essence of a document. Experienced readers often review the graphical material in a document first to quickly grasp the contents of the document. It is thus evident that identifying and extracting the structured content of a document, e.g., graphical components, is important in building a deeper semantic understanding of the document.

Techniques presented herein automatically extract the structured content of documents. Machine-learning techniques, e.g., object detection, computer vision, etc., are used to recognize and extract the structured content. The techniques work well regardless of the tool used to create the document. For example, the document can be a PDF file, captured via screenshot, generated by a computer-aided design tool, etc. The techniques work across fields of study, across publishing conventions, languages and written scripts, and are robust to different formats of graphical content, e.g., vector/raster graphics.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.