Abstract

Integration testing in continuous integration/development environments produces large volumes of logs containing many errors and warnings, many of which are benign or unrelated. Identifying the specific error causing a test failure amidst this noise is a major challenge for developers seeking efficient and accurate debugging. This disclosure describes techniques that leverage large language models (LLMs) to automate and abstract root-cause analyses for test failures. A sequential, multi-step pipeline comprising artificial intelligence (AI) agents accepts as input a reference to a failing test run and produces a high-value, actionable output, e.g., a summary of the root cause and a suggested code fix. In contrast to traditional text diffing, critical errors are isolated by using the abilities of the LLM to perform a deeper semantic diffing. The multi-step pipeline, each component of which is an AI agent, includes context seeding, semantic comparison, summarization and root-cause identification, and automated fix suggestion.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS