Developers expend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, the knowledge of prior edits boosts the performance of the LLMs significantly and enables them to generate 29% and 54% more correctly edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.
翻译:开发人员因修复缺陷或添加新功能等原因,在编辑代码上花费了大量时间。由于代码编辑的多样性和捕捉开发者意图的难度,设计预测代码编辑的有效方法一直是活跃但充满挑战的研究领域。本研究通过赋予预训练的大型代码语言模型(LLMs)先前的相关编辑知识来解决这些挑战。LLMs的生成能力有助于应对代码变更的多样性,而基于先前编辑的条件代码生成则有助于捕捉潜在的开发者意图。我们分别在零样本和微调设置下评估了两种著名的LLM——Codex和CodeT5。在两个数据集上的实验中,先前编辑的知识显著提升了LLMs的性能,使其在Top-1建议中生成的正确编辑代码数量分别比当前最先进的符号方法和神经方法高出29%和54%。