Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces a multi-role approach to employ LLMs to act as different roles simulating a real-life code review process and engaging in discussions toward a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of this approach indicates a 13.48% increase in the precision rate, an 18.25% increase in the recall rate, and a 16.13% increase in the F1 score.
翻译:近年来,大语言模型(LLMs)的进展凸显了其在漏洞检测(软件质量保证的关键组成部分)方面的潜力。尽管取得了这些进展,但多数研究仍局限于单一角色(通常是测试人员)的视角,缺乏典型软件开发周期中不同角色(包括开发者和测试人员)的多元观点。为此,本文提出了一种多角色方法,利用LLMs模拟真实代码评审过程中不同角色的行为,并通过讨论对代码中漏洞的存在性及分类达成共识。初步评估表明,该方法的精确率提升了13.48%,召回率提升了18.25%,F1分数提升了16.13%。