Abstract
Extracting structured data from tables in PDF documents can be difficult because conventional parsers may not fully preserve intricate layouts, merged cells, or associated metadata, which can lead to inaccurate inputs for downstream systems like retrieval augmented generation. A hybrid pipeline is described that can combine a conventional parser with a multimodal Large Language Model (LLM). An initial parser can extract a table into a structured format, for example, HTML. Then, in an iterative process, a multimodal LLM can review the HTML representation alongside an image of the source PDF page to first generate a critique and then correct structural errors. This iterative method of generating critiques and applying corrections can produce an HTML representation of the table with improved structural accuracy. A final pass using the LLM may also be performed to extract surrounding contextual metadata, potentially improving the overall quality and completeness of the parsed data for subsequent use.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Kuligin, Leonid and Movsesyan, Grigory, "Hybrid Portable Document Format Table Parsing with Iterative Refinement by a Multimodal Large Language Model", Technical Disclosure Commons, (October 15, 2025)
https://www.tdcommons.org/dpubs_series/8719