Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our findings reveal a distinct trade-off between the two approaches. Prompted proprietary models demonstrate the highest sensitivity to SBRs, achieving a G-measure of 77% and a recall of 74% on average across all the datasets, albeit at the cost of a higher false-positive rate, resulting in an average precision of only 22%. Fine-tuned models, by contrast, exhibit the opposite behavior, attaining a lower overall G-measure of 51% but substantially higher precision of 75% at the cost of reduced recall of 36%. Though a one-time investment in building fine-tuned models is necessary, the inference on the largest dataset is up to 50 times faster than that of proprietary models. These findings suggest that further investigations to harness the power of LLMs for SBR prediction are necessary.
翻译:安全漏洞报告(SBR)的早期检测对于及时缓解漏洞至关重要。本文评估了基于提示工程和微调方法,利用大型语言模型(LLMs)预测SBR的效果。我们的研究结果揭示了两种方法之间存在明显的权衡。基于提示的专有模型对SBR表现出最高的敏感性,在所有数据集上平均实现了77%的G-measure和74%的召回率,但代价是较高的误报率,导致平均精确率仅为22%。相比之下,微调模型表现出相反的特性,获得了较低的总体G-measure(51%),但精确率显著更高(75%),代价是召回率降低至36%。尽管构建微调模型需要一次性的投入,但在最大数据集上的推理速度比专有模型快达50倍。这些发现表明,有必要进一步研究如何利用LLMs的力量进行SBR预测。