Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code vulnerabilities remains a challenging task for LLMs. An effective way to improve LLM performance is in-context learning (ICL) - providing few-shot examples similar to the query, along with correct answers, can improve an LLM's ability to generate correct solutions. However, choosing the few-shot examples appropriately is crucial to improving model performance. In this paper, we explore two criteria for choosing few-shot examples for ICL used in the code vulnerability detection task. The first criterion considers if the LLM (consistently) makes a mistake or not on a sample with the intuition that LLM performance on a sample is informative about its usefulness as a few-shot example. The other criterion considers similarity of the examples with the program under query and chooses few-shot examples based on the $k$-nearest neighbors to the given sample. We perform evaluations to determine the benefits of these criteria individually as well as under various combinations, using open-source models on multiple datasets.
翻译:大语言模型(LLMs)在诸多编码任务中展现出卓越能力,包括代码摘要、翻译、补全与生成。然而,检测代码漏洞对LLMs而言仍具挑战性。提升LLM性能的有效途径之一是上下文学习(ICL)——通过提供与查询相似的少样本示例及其正确答案,可增强LLM生成正确解决方案的能力。然而,恰当选择少样本示例对提升模型性能至关重要。本文针对代码漏洞检测任务中的ICL应用,探讨了两种少样本示例选择标准。第一项标准基于LLM对样本是否(持续)产生错误判断,其核心思想是:LLM在样本上的表现能反映该样本作为少样本示例的有效性。另一项标准则考量示例与待查询程序的相似性,通过选取给定样本的$k$近邻来确定少样本示例。我们使用开源模型在多个数据集上进行评估,以验证这些标准单独及组合应用时的效果。