Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing

Developers often dedicate significant time to maintaining and refactoring existing code. However, most prior work on generative models for code focuses solely on creating new code, overlooking the distinctive needs of editing existing code. In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same codebase. Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks. We represent code changes using a line diff format and employ static analysis to form large customized model contexts, ensuring the availability of appropriate information for prediction. We collect a code editing dataset from the commit histories of 1650 open-source Python projects for training and evaluation. In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models (bringing exact-match accuracy from 34.7 up to 60.4), demonstrating the benefits of incorporating editing history for code completion. In a multi-round, multi-edit setting, we observe substantial gains by iteratively conditioning on additional user edits. We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage.

翻译：摘要：开发人员常需投入大量时间维护和重构现有代码。然而，现有生成式代码模型多聚焦于新代码生成，忽视了代码编辑场景的独特需求。本研究探索多轮代码自动编辑场景，旨在基于同一代码库中的近期变更预测代码区域的编辑操作。我们提出的模型Coeditor是一个专为代码编辑任务微调的语言模型，采用行差异格式表征代码变更，并通过静态分析构建大规模定制化模型上下文，确保预测所需信息的可获取性。我们从1650个开源Python项目的提交历史中收集代码编辑数据集用于训练与评估。在简化的单轮单编辑任务中，Coeditor显著优于GPT-3.5及当前最先进的开源代码补全模型（将精确匹配准确率从34.7%提升至60.4%），证明了融合编辑历史对代码补全的增益效果。在多轮多编辑场景中，通过基于用户新增编辑进行迭代条件建模，我们观察到显著性能提升。为促进后续研究，我们已开源相关代码、数据及模型权重，并发布了基于该模型的VSCode扩展以支持交互式IDE使用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日