Abstract
Conventional vector-based retrieval augmented generation systems exhibit structural blindness when used for code migration, severing critical architectural relationships like class inheritance and leading to the generation of non-functional code with high API hallucination rates. A described approach models a codebase as an interconnected property graph where source code is parsed into abstract syntax trees (ASTs) and mapped to a graph of code entities (nodes) and their dependencies (edges). A topological sort of this graph creates a dependency-aware translation queue, ensuring parent classes are processed before child classes. This method, which can be combined with a semantic vector search fallback for a hybrid retrieval strategy, provides a large language model (LLM) with architecturally-aware context. This may significantly improve the functional integrity of the generated code and reduce structural errors during automated software modernization.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bhaumik, Suddhasatwa; Jaiswal, Nilesh; Garg, Saurabh; Agrawal, Aniket; Malhotra, Divya; and Shukla, Arjit, "Mitigate Structural Blindness in Code Translation using a Graph-Based Representation", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10547