DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the black-box nature of DL, and the domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) have made lots of progress in addressing these issues by leveraging prompting techniques. Unfortunately, their performance in identifying vulnerabilities is unsatisfactory. This paper contributes \textbf{\DLAP}, a \underline{\textbf{D}}eep \underline{\textbf{L}}earning \underline{\textbf{A}}ugmented LLMs \underline{\textbf{P}}rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that \DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.

翻译：软件漏洞检测通常由自动化静态分析工具支持，近年来深度学习模型进一步增强了这些工具。然而，尽管基于深度学习的检测方法在研究中的性能优于基于规则的方法，但由于源代码结构复杂、深度学习的黑箱特性以及理解与验证黑箱结果所需领域知识的挑战，将深度学习方法应用于实际软件漏洞检测仍然困难。传统深度学习模型通过特定项目训练，因此擅长识别这些项目中的漏洞，但无法有效检测其他项目。这类在漏洞检测中表现不佳的模型会影响后续任务（如定位与修复）。更重要的是，这些模型无法为开发者提供解释以理解检测结果。相比之下，大语言模型通过利用提示技术在解决这些问题上取得了显著进展，但其在漏洞识别方面的性能仍不理想。本文提出了\DLAP——一种结合深度学习模型与大语言模型优势的深度增强的大语言模型提示框架，以实现卓越的漏洞检测性能。实验评估结果证实，\DLAP在多维度指标上优于现有最先进的提示框架，包括基于角色的提示、辅助信息提示、思维链提示、上下文学习提示以及微调方法。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日