Abstract
Generative artificial intelligence systems, such as large language models, may face limitations when applied to large codebases because finite context windows can struggle to represent complex code structures. A syntax-aware knowledge graph may be generated from heterogeneous codebase artifacts, including, for example, source code, documentation, and infrastructure definitions. An ingestion pipeline can use specialized parsers to identify code entities as nodes and their relationships, such as calls or uses, as edges to form a unified graph. This knowledge graph can serve as a data structure for artificial intelligence-powered coding assistants, which allows for querying a compact, structurally-aware context. This approach may improve contextual understanding for automated code generation and analysis, and can help mitigate issues related to context window size and high token consumption.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Kuligin, Leonid and Steiner, Marcello, "Generating a Syntax-Aware Knowledge Graph from Heterogeneous Codebase Artifacts", Technical Disclosure Commons, (June 23, 2026)
https://www.tdcommons.org/dpubs_series/10555