Large Language Models (LLMs) have shown significant challenges in detecting and repairing vulnerable code, particularly when dealing with vulnerabilities involving multiple aspects, such as variables, code flows, and code structures. In this study, we utilize GitHub Copilot as the LLM and focus on buffer overflow vulnerabilities. Our experiments reveal a notable gap in Copilot's abilities when dealing with buffer overflow vulnerabilities, with a 76% vulnerability detection rate but only a 15% vulnerability repair rate. To address this issue, we propose context-aware prompt tuning techniques designed to enhance LLM performance in repairing buffer overflow. By injecting a sequence of domain knowledge about the vulnerability, including various security and code contexts, we demonstrate that Copilot's successful repair rate increases to 63%, representing more than four times the improvement compared to repairs without domain knowledge.
翻译:大型语言模型(LLMs)在检测和修复易受攻击代码方面面临显著挑战,尤其是在处理涉及变量、代码流和代码结构等多方面因素的漏洞时。本研究以GitHub Copilot作为LLM研究对象,重点关注缓冲区溢出漏洞。实验结果表明,Copilot在处理缓冲区溢出漏洞时存在明显能力缺口:漏洞检测率达到76%,但漏洞修复率仅为15%。为应对此问题,我们提出了上下文感知提示调优技术,旨在提升LLM修复缓冲区溢出的性能。通过注入包含多种安全与代码上下文的漏洞领域知识序列,我们证明Copilot的成功修复率可提升至63%,相较于无领域知识的修复方式实现了超过四倍的性能提升。