Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful models for code generation and understanding. Fine-tuning these models comes with a high computational cost and requires a large labeled dataset. Alternatively, in-context learning techniques allow models to learn downstream tasks with only a few examples. Recently, researchers have shown how in-context learning performs well in bug detection and repair. In this paper, we propose code-pair classification task in which both the buggy and non-buggy versions are given to the model, and the model identifies the buggy ones. We evaluate our task in real-world dataset of bug detection and two most powerful LLMs. Our experiments indicate that an LLM can often pick the buggy from the non-buggy version of the code, and the code-pair classification task is much easier compared to be given a snippet and deciding if and where a bug exists.
翻译:大型语言模型(如GPT-3.5和CodeLlama)在代码生成与理解方面具有强大能力。对这类模型进行微调需要高昂的计算成本及大规模标注数据集。相比之下,情境学习技术允许模型仅通过少量示例即可习得下游任务。近期研究表明,情境学习在缺陷检测与修复任务中表现优异。本文提出代码对分类任务:将含缺陷/不含缺陷的代码版本同时输入模型,由模型识别出缺陷版本。我们在真实世界缺陷检测数据集及两个最强大语言模型上评估了该任务。实验表明,语言模型通常能从无缺陷代码中识别出缺陷版本,且相较于直接给定代码片段并判断是否存在缺陷及其位置,代码对分类任务的难度显著降低。