Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful models for code generation and understanding. Fine-tuning these models comes with a high computational cost and requires a large labeled dataset. Alternatively, in-context learning techniques allow models to learn downstream tasks with only a few examples. Recently, researchers have shown how in-context learning performs well in bug detection and repair. In this paper, we propose code-pair classification task in which both the buggy and non-buggy versions are given to the model, and the model identifies the buggy ones. We evaluate our task in real-world dataset of bug detection and two most powerful LLMs. Our experiments indicate that an LLM can often pick the buggy from the non-buggy version of the code, and the code-pair classification task is much easier compared to be given a snippet and deciding if and where a bug exists.
翻译:大规模语言模型(LLMs)如GPT-3.5和CodeLlama在代码生成与理解方面展现出强大能力。微调这些模型需要高昂的计算成本且依赖大规模标注数据集。相较而言,上下文学习技术能让模型仅凭少量示例即可习得下游任务。近期研究表明,上下文学习在缺陷检测与修复领域表现优异。本文提出代码对分类任务:将包含缺陷与无缺陷的代码版本同时输入模型,要求模型识别出存在缺陷的版本。我们在真实缺陷检测数据集上对两个最强LLMs进行了评估。实验结果表明,LLMs通常能有效区分代码的有缺陷与无缺陷版本,且相较于直接给定代码片段并判断是否存在缺陷及定位缺陷位置,代码对分类任务显著更为简单。