Abstract

Conventional vector-based retrieval augmented generation systems exhibit structural blindness when used for code migration, severing critical architectural relationships like class inheritance and leading to the generation of non-functional code with high API hallucination rates. A described approach models a codebase as an interconnected property graph where source code is parsed into abstract syntax trees (ASTs) and mapped to a graph of code entities (nodes) and their dependencies (edges). A topological sort of this graph creates a dependency-aware translation queue, ensuring parent classes are processed before child classes. This method, which can be combined with a semantic vector search fallback for a hybrid retrieval strategy, provides a large language model (LLM) with architecturally-aware context. This may significantly improve the functional integrity of the generated code and reduce structural errors during automated software modernization.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS