Abstract
Large Language Models (LLMs) may produce syntactically invalid JavaScript Object Notation (JSON) outputs, which can lead to parsing failures in downstream applications. Such errors may include, for example, unbalanced braces, improper quoting, or unescaped characters, and can stem from the probabilistic nature of text generation. A relaxed parsing method is described to address these types of issues. Rather than performing strict, token-by-token validation, the described parser can identify a key-value pair, consume characters between a separating colon and a recognized end-of-value delimiter (such as a comma or closing brace) to define a value, and then attempt to repair certain errors within that value string. This approach may improve the resilience of systems that rely on LLM-generated data by enabling the extraction of content from some forms of malformed JSON, potentially reducing parsing failures and the need for resource-intensive retries.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Fang, Fang (Anna), "A Relaxed Parsing Method for Correcting Malformed JavaScript Object Notation from Language Models", Technical Disclosure Commons, (October 15, 2025)
https://www.tdcommons.org/dpubs_series/8721