Abstract

While large language models (LLMs) can generate code, training of such models has not made use of data generated during the collaborative code review process that is a standard part of software development. This disclosure describes techniques that utilize historical code review data (including reviewer comments and corresponding code edits) available within organization internal code repositories to train LLMs to generate code. The historical code review data can be used for model tuning, to train an LLM via reinforcement learning from human feedback (RLHF), and/or via prompt engineering. The trained model can be utilized to generate code starting from code description provided using a prompt template. The prompt template can incorporate organization specific factors such as developer guidelines, developer or team style, etc. Code generated by the LLM can be iteratively refined via human review as well as from analytical tools that ensure style compliance, code coverage, test success rate, comment conventions, etc.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS