Abstract

Large Language Models (LLMs) may produce syntactically invalid JavaScript Object Notation (JSON) outputs, which can lead to parsing failures in downstream applications. Such errors may include, for example, unbalanced braces, improper quoting, or unescaped characters, and can stem from the probabilistic nature of text generation. A relaxed parsing method is described to address these types of issues. Rather than performing strict, token-by-token validation, the described parser can identify a key-value pair, consume characters between a separating colon and a recognized end-of-value delimiter (such as a comma or closing brace) to define a value, and then attempt to repair certain errors within that value string. This approach may improve the resilience of systems that rely on LLM-generated data by enabling the extraction of content from some forms of malformed JSON, potentially reducing parsing failures and the need for resource-intensive retries.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Fang, Fang (Anna), "A Relaxed Parsing Method for Correcting Malformed JavaScript Object Notation from Language Models", Technical Disclosure Commons, (October 15, 2025)
https://www.tdcommons.org/dpubs_series/8721

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

A Relaxed Parsing Method for Correcting Malformed JavaScript Object Notation from Language Models

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

A Relaxed Parsing Method for Correcting Malformed JavaScript Object Notation from Language Models

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information