Code editing is essential in evolving software development. Many automated code editing tools have been proposed that leverage both Information Retrieval-based techniques and Machine Learning-based code generation and code editing models. Each technique comes with its own promises and perils, and they are often used together to complement their strengths and compensate for their weaknesses. This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification. Our key observation is that a patch obtained by search and retrieval, even if imperfect, can provide helpful guidance to a code generation model. However, a retrieval-guided patch produced by a code generation model can still be a few tokens off from the intended patch. Such generated patches can be slightly modified to create the intended patches. SARGAM is a novel tool designed to mimic a real developer's code editing behavior. Given an original code version, the developer may search for related patches, generate or write the code, and then modify the generated code to adapt it to the right context. Our evaluation of SARGAM on edit generation shows superior performance with respect to current state-of-the-art techniques. SARGAM also shows great effectiveness on automated program repair tasks.
翻译:代码编辑在软件开发的演进过程中至关重要。目前已提出多种自动代码编辑工具,它们融合了基于信息检索的技术与基于机器学习的代码生成及编辑模型。每种技术各有其优势与局限,常被结合使用以取长补短。本文提出一种混合方法,通过整合代码搜索、生成与修改的能力,更有效地合成代码编辑补丁。我们的关键发现是:即使通过搜索与检索获得的补丁不完美,仍能为代码生成模型提供有价值的指导。然而,由代码生成模型基于检索生成的补丁可能与预期补丁存在少量标记偏差。此类生成补丁可经小幅修改后形成预期补丁。SARGAM 是一款模拟真实开发者代码编辑行为的新型工具。给定原始代码版本时,开发者可能先搜索相关补丁,再生成或编写代码,最后调整生成代码以适应正确上下文。我们对 SARGAM 的编辑生成评估表明,其性能优于当前最先进技术。同时,SARGAM 在自动程序修复任务中也展现出显著成效。