Abstract

Generative artificial intelligence systems, such as large language models, may face limitations when applied to large codebases because finite context windows can struggle to represent complex code structures. A syntax-aware knowledge graph may be generated from heterogeneous codebase artifacts, including, for example, source code, documentation, and infrastructure definitions. An ingestion pipeline can use specialized parsers to identify code entities as nodes and their relationships, such as calls or uses, as edges to form a unified graph. This knowledge graph can serve as a data structure for artificial intelligence-powered coding assistants, which allows for querying a compact, structurally-aware context. This approach may improve contextual understanding for automated code generation and analysis, and can help mitigate issues related to context window size and high token consumption.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS