Abstract
Large Language Models (LLMs) may experience challenges when processing information from hypertext markup language (HTML) tables, as providing table data without its original on-page context can lead to suboptimal comprehension and factuality. A disclosed technology can address this with a two-step offline process. First, a context generation module can create a descriptive summary for a table by analyzing its page title, table title, and nearby caption text. Second, a linearization module can use this context to convert the structured HTML table into a descriptive, natural language paragraph. The resulting linearized text passages may enable LLMs to better comprehend and integrate tabular information, potentially improving the relevance, factuality, and overall quality of generated summaries and answers for search and question-answering systems.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Dwivedi, Ashutosh, "Generating Context-Aware Linearized Representations of Tabular Data for Language Models", Technical Disclosure Commons, (August 19, 2025)
https://www.tdcommons.org/dpubs_series/8483