Generative deep learning (DL) models have been successfully adopted for vulnerability patching. However, such models require the availability of a large dataset of patches to learn from. To overcome this issue, researchers have proposed to start from models pre-trained with general knowledge, either on the programming language or on similar tasks such as bug fixing. Despite the efforts in the area of automated vulnerability patching, there is a lack of systematic studies on how these different training procedures impact the performance of DL models for such a task. This paper provides a manyfold contribution to bridge this gap, by (i) comparing existing solutions of self-supervised and supervised pre-training for vulnerability patching; and (ii) for the first time, experimenting with different kinds of prompt-tuning for this task. The study required to train/test 23 DL models. We found that a supervised pre-training focused on bug-fixing, while expensive in terms of data collection, substantially improves DL-based vulnerability patching. When applying prompt-tuning on top of this supervised pre-trained model, there is no significant gain in performance. Instead, prompt-tuning is an effective and cheap solution to substantially boost the performance of self-supervised pre-trained models, i.e., those not relying on the bug-fixing pre-training.
翻译:生成式深度学习模型已被成功应用于漏洞修补。然而,此类模型需要依赖大规模补丁数据集进行学习。为解决这一难题,研究人员提出从预训练通用知识的模型出发,这些知识涵盖编程语言或相似任务(如缺陷修复)。尽管自动化漏洞修补领域已取得诸多进展,但目前尚缺乏关于不同训练流程如何影响深度学习模型性能的系统性研究。本文通过以下多方位贡献弥补这一空白:(i)比较现有自监督与监督预训练在漏洞修补中的解决方案;(ii)首次针对该任务探索多种提示微调方法。本研究需要训练/测试23个深度学习模型。我们发现,专注于缺陷修复的监督预训练虽在数据收集方面成本高昂,却能显著提升基于深度学习的漏洞修补性能。在此监督预训练模型基础上应用提示微调,性能提升并不显著。相反,提示微调是一种有效且低成本的解决方案,能大幅提升自监督预训练模型(即不依赖缺陷修复预训练的模型)的性能。