Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable few-shot learning capabilities in various tasks. However, the effectiveness of LLMs in detecting software vulnerabilities is largely unexplored. This paper aims to bridge this gap by exploring how LLMs perform with various prompts, particularly focusing on two state-of-the-art LLMs: GPT-3.5 and GPT-4. Our experimental results showed that GPT-3.5 achieves competitive performance with the prior state-of-the-art vulnerability detection approach and GPT-4 consistently outperformed the state-of-the-art.
翻译:以往的基于学习的漏洞检测方法要么依赖中等规模的预训练模型,要么从头开始训练较小的神经网络。近年来,大型预训练语言模型(LLM)在各种任务中展现出卓越的少样本学习能力。然而,LLM在检测软件漏洞方面的有效性尚未得到充分探索。本文旨在弥补这一空白,通过研究LLM在不同提示下的表现,特别关注两种最先进的LLM:GPT-3.5和GPT-4。我们的实验结果表明,GPT-3.5能够达到与先前最先进的漏洞检测方法相媲美的性能,而GPT-4则持续超越当前最优水平。