Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces an approach to employ LLMs to act as different roles to simulate real-life code review process, engaging in discussions towards a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of the proposed approach indicates a 4.73% increase in the precision rate, 58.9% increase in the recall rate, and a 28.1% increase in the F1 score.
翻译:近期大语言模型(LLM)的进展凸显了其在漏洞检测领域的潜力,这是软件质量保证的关键环节。尽管取得进展,现有研究大多局限于单一角色(通常为测试人员)的视角,缺乏典型软件开发生命周期中不同角色(包括开发人员和测试人员)的多元化观点。为此,本文提出一种方法,利用LLM扮演不同角色模拟真实代码审查流程,通过讨论达成关于代码中漏洞存在性及分类的共识。初步评估表明,该方法在精确率上提升4.73%,召回率提升58.9%,F1分数提升28.1%。