Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap

The significant advancements in Large Language Models (LLMs) have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous recent studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing survey that focuses on the utilization of LLMs for vulnerability detection and repair. In this paper, we aim to bridge this gap by offering a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs. The review encompasses research work from leading SE, AI, and Security conferences and journals, covering 36 papers published at 21 distinct venues. By answering three key research questions, we aim to (1) summarize the LLMs employed in the relevant literature, (2) categorize various LLM adaptation techniques in vulnerability detection, and (3) classify various LLM adaptation techniques in vulnerability repair. Based on our findings, we have identified a series of challenges that still need to be tackled considering existing studies. Additionally, we have outlined a roadmap highlighting potential opportunities that we believe are pertinent and crucial for future research endeavors.

翻译：大语言模型（LLMs）的显著进步使其在软件工程（SE）领域的各项任务中得到了广泛应用，包括漏洞检测与修复。近期大量研究探讨了利用LLMs增强漏洞检测与修复任务的方法。尽管研究兴趣日益增长，但目前尚无专门聚焦于LLMs在漏洞检测与修复中应用的综述性文献。本文旨在通过系统文献综述填补这一空白，系统梳理了利用LLMs改进漏洞检测与修复的方法。综述涵盖了来自软件工程、人工智能及安全领域顶级会议和期刊的研究成果，共涉及21个不同学术平台的36篇论文。通过回答三个关键研究问题，我们旨在：（1）总结相关文献中采用的LLMs类型；（2）分类漏洞检测中各类LLM适配技术；（3）分类漏洞修复中各类LLM适配技术。基于研究结果，我们识别出现有研究中仍需应对的一系列挑战，并提出了我们认为对未来研究具有重要意义的潜在研究方向路线图。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日