Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the black-box nature of DL, and the domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) have made lots of progress in addressing these issues by leveraging prompting techniques. Unfortunately, their performance in identifying vulnerabilities is unsatisfactory. This paper contributes \textbf{\DLAP}, a \underline{\textbf{D}}eep \underline{\textbf{L}}earning \underline{\textbf{A}}ugmented LLMs \underline{\textbf{P}}rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that \DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.
翻译:软件漏洞检测通常由自动化静态分析工具支持,近年来深度学习模型进一步增强了这些工具。然而,尽管基于深度学习的检测方法在研究中的性能优于基于规则的方法,但由于源代码结构复杂、深度学习的黑箱特性以及理解与验证黑箱结果所需领域知识的挑战,将深度学习方法应用于实际软件漏洞检测仍然困难。传统深度学习模型通过特定项目训练,因此擅长识别这些项目中的漏洞,但无法有效检测其他项目。这类在漏洞检测中表现不佳的模型会影响后续任务(如定位与修复)。更重要的是,这些模型无法为开发者提供解释以理解检测结果。相比之下,大语言模型通过利用提示技术在解决这些问题上取得了显著进展,但其在漏洞识别方面的性能仍不理想。本文提出了\DLAP——一种结合深度学习模型与大语言模型优势的深度增强的大语言模型提示框架,以实现卓越的漏洞检测性能。实验评估结果证实,\DLAP在多维度指标上优于现有最先进的提示框架,包括基于角色的提示、辅助信息提示、思维链提示、上下文学习提示以及微调方法。