Code review is critical for ensuring software quality and maintainability. With the rapid growth in software scale and complexity, code review has become a bottleneck in the development process because of its time-consuming and knowledge-intensive nature and the shortage of experienced developers willing to review code. Several approaches have been proposed for automatically generating code reviews based on retrieval, neural machine translation, pre-trained models, or large language models (LLMs). These approaches mainly leverage historical code changes and review comments. However, a large amount of crucial information for code review, such as the context of code changes and prior review knowledge, has been overlooked. This paper proposes an LLM-based review knowledge-augmented, context-aware framework for code review generation, named LAURA. The framework integrates review exemplar retrieval, context augmentation, and systematic guidance to enhance the performance of ChatGPT-4o and DeepSeek v3 in generating code review comments. Besides, given the extensive low-quality reviews in existing datasets, we also constructed a high-quality dataset. Experimental results show that for both models, LAURA generates review comments that are either completely correct or at least helpful to developers in 42.2% and 40.4% of cases, respectively, significantly outperforming SOTA baselines. Furthermore, our ablation studies demonstrate that all components of LAURA contribute positively to improving comment quality.
翻译:代码审查对于确保软件质量和可维护性至关重要。随着软件规模和复杂性的快速增长,代码审查因其耗时性、知识密集性以及愿意审查代码的经验丰富的开发者短缺,已成为开发过程中的瓶颈。已有多种方法被提出用于自动生成代码审查,这些方法主要基于检索、神经机器翻译、预训练模型或大语言模型(LLMs),并主要利用历史代码变更和审查评论。然而,大量对代码审查至关重要的信息,如代码变更的上下文和先前的审查知识,却被忽视了。本文提出了一种基于LLM的审查知识增强、上下文感知的代码审查生成框架,命名为LAURA。该框架集成了审查示例检索、上下文增强和系统化指导,以提升ChatGPT-4o和DeepSeek v3在生成代码审查评论方面的性能。此外,针对现有数据集中存在的大量低质量审查,我们还构建了一个高质量数据集。实验结果表明,对于两种模型,LAURA生成的审查评论在42.2%和40.4%的情况下分别完全正确或至少对开发者有帮助,显著优于最先进的基线方法。此外,我们的消融研究表明,LAURA的所有组件都对提升评论质量有积极贡献。