Developers expend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, the knowledge of prior edits boosts the performance of the LLMs significantly and enables them to generate 29% and 54% more correctly edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.
翻译:摘要:开发者因修复缺陷或添加新功能等多种原因,在代码编辑上投入大量时间。由于代码编辑的多样性以及捕捉开发者意图的困难性,设计有效的代码编辑预测方法一直是活跃且具有挑战性的研究领域。本文通过赋予预训练大型代码语言模型(LLMs)先前相关编辑的知识来解决这些挑战。LLMs的生成能力有助于应对代码变更的多样性,而基于先前编辑的条件代码生成则有助于捕捉潜在的开发者意图。我们分别评估了两种著名的LLMs(Codex和CodeT5)在零样本和微调设置下的表现。在两个数据集上的实验表明,先前编辑的知识显著提升了LLMs的性能,使其在top-1建议中生成的正确编辑代码数量比当前最先进的符号方法和神经网络方法分别增加了29%和54%。