Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to automate. Existing automated methods rely on single input-output generative models and thus generally struggle to emulate the collaborative nature of code review. This work introduces \tool{}, a novel multi-agent Large Language Model (LLM) system for code review automation. CodeAgent incorporates a supervisory agent, QA-Checker, to ensure that all the agents' contributions address the initial review question. We evaluated CodeAgent on critical code review tasks: (1) detect inconsistencies between code changes and commit messages, (2) identify vulnerability introductions, (3) validate code style adherence, and (4) suggest code revision. The results demonstrate CodeAgent's effectiveness, contributing to a new state-of-the-art in code review automation. Our data and code are publicly available (\url{https://github.com/Code4Agent/codeagent}).
翻译:代码审查旨在确保软件的整体质量与可靠性,是软件开发的基石。然而,尽管至关重要,代码审查是一个劳动密集型过程,研究界正寻求将其自动化。现有的自动化方法依赖于单一输入-输出的生成模型,因此通常难以模拟代码审查的协作特性。本文介绍了\tool{},一种用于代码审查自动化的新型多智能体大语言模型系统。CodeAgent引入了一个监督智能体QA-Checker,以确保所有智能体的贡献均针对初始审查问题。我们在关键代码审查任务上评估了CodeAgent的性能:(1)检测代码变更与提交信息之间的不一致性,(2)识别漏洞引入,(3)验证代码风格符合性,以及(4)提出代码修订建议。实验结果证明了CodeAgent的有效性,为代码审查自动化贡献了新的最先进水平。我们的数据与代码已公开(\url{https://github.com/Code4Agent/codeagent})。