Abstract

Large Language Models (LLMs) may experience challenges when processing information from hypertext markup language (HTML) tables, as providing table data without its original on-page context can lead to suboptimal comprehension and factuality. A disclosed technology can address this with a two-step offline process. First, a context generation module can create a descriptive summary for a table by analyzing its page title, table title, and nearby caption text. Second, a linearization module can use this context to convert the structured HTML table into a descriptive, natural language paragraph. The resulting linearized text passages may enable LLMs to better comprehend and integrate tabular information, potentially improving the relevance, factuality, and overall quality of generated summaries and answers for search and question-answering systems.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Dwivedi, Ashutosh, "Generating Context-Aware Linearized Representations of Tabular Data for Language Models", Technical Disclosure Commons, (August 19, 2025)
https://www.tdcommons.org/dpubs_series/8483

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Generating Context-Aware Linearized Representations of Tabular Data for Language Models

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Generating Context-Aware Linearized Representations of Tabular Data for Language Models

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information