Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering

Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence that code review could improve the program repair further. Large language models, trained with Natural Language (NL) and Programming Language (PL), can contain inherent knowledge of both. In this study, we investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair. We applied PLBART and CodeT5, two state-of-the-art language models that are pre-trained with both PL and NL, on two such natural language-based program repair datasets and found that the pre-trained language models fine-tuned with datasets containing both code review and subsequent code changes notably outperformed each of the previous models. With the advent of code generative models like Codex and GPT-3.5-Turbo, we also performed zero-shot and few-shots learning-based prompt engineering to assess their performance on these datasets. However, the practical application of using LLMs in the context of automated program repair is still a long way off based on our manual analysis of the generated repaired codes by the learning models.

翻译：序列到序列模型在充足数据集训练下，已被用于将错误程序转换为正确程序。近期研究也提供了有力实证证据，表明代码审查可进一步改进程序修复。兼具自然语言与编程语言知识训练的大型语言模型，能够同时包含两者的内在知识。本研究探究这种编程语言与自然语言的内在知识能否用于改进自动程序修复。我们应用PLBART和CodeT5这两种同时基于编程语言和自然语言预训练的最先进语言模型，在两个基于自然语言的程序修复数据集上进行实验，发现经同时包含代码审查及后续代码更改的数据集微调后，预训练语言模型的表现显著优于此前所有模型。随着Codex和GPT-3.5-Turbo等代码生成模型的出现，我们还进行了零样本和少样本学习的提示工程，以评估这些模型在上述数据集上的性能。然而，根据我们对学习模型生成的修复代码进行的人工分析，在自动程序修复场景中实际应用大型语言模型仍任重道远。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日