Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example

from arxiv, This paper is accepted to Proceedings of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE - 2024), This is an author copy

Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. Large Language Models (LLMs), trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.

翻译：软件开发者常在不同项目内或跨项目重复执行代码变更，这类变更被称为“代码变更模式”（CPAT）。自动化CPAT能加速开发进程，然而当前的示例驱动变换（TBE）技术受限于输入示例的质量与数量，难以覆盖语法或流程不同但语义相似的变体。基于海量代码数据集训练的大语言模型（LLMs）能够生成语义等价且未见过的CPAT变体，从而突破这些限制，提升TBE的有效性。我们总结出利用LLMs生成符合正确性、实用性和可应用性标准的代码变体的最佳实践。通过将静态与动态分析技术结合LLMs实现于PyCraft系统，我们在识别正确变体时达到96.6%的F值，平均将输入扩展58倍，并将目标代码的自动变更量提升至39倍。PyCraft生成的补丁已提交至microsoft/DeepSpeed、IBM/inFairness等开源项目，获得83%的采纳率，验证了本方法的实用性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日