Defensive Publications Series

Generating a Syntax-Aware Knowledge Graph from Heterogeneous Codebase Artifacts

Abstract

Generative artificial intelligence systems, such as large language models, may face limitations when applied to large codebases because finite context windows can struggle to represent complex code structures. A syntax-aware knowledge graph may be generated from heterogeneous codebase artifacts, including, for example, source code, documentation, and infrastructure definitions. An ingestion pipeline can use specialized parsers to identify code entities as nodes and their relationships, such as calls or uses, as edges to form a unified graph. This knowledge graph can serve as a data structure for artificial intelligence-powered coding assistants, which allows for querying a compact, structurally-aware context. This approach may improve contextual understanding for automated code generation and analysis, and can help mitigate issues related to context window size and high token consumption.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Kuligin, Leonid and Steiner, Marcello, "Generating a Syntax-Aware Knowledge Graph from Heterogeneous Codebase Artifacts", Technical Disclosure Commons, (June 23, 2026)
https://www.tdcommons.org/dpubs_series/10555

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Generating a Syntax-Aware Knowledge Graph from Heterogeneous Codebase Artifacts

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Generating a Syntax-Aware Knowledge Graph from Heterogeneous Codebase Artifacts

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information