基于语义引导补丁生成的内存高效大语言模型程序修复方法 (Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation)

In this paper, we first show that increases in beam size, even for small-sized LLMs (1B-7B params), require extensive GPU usage, leading to up to 80% of recurring crashes due to memory overloads in LLM-based APR. Seemingly simple solutions to reduce memory consumption are (1) to quantize LLM models, i.e., converting the weights of an LLM from high-precision values to lower-precision ones, and (2) to make beam search sequential, i.e., forwarding each beam through the model sequentially and then concatenating them back into a single output. However, we show that these approaches still do not work via both theoretical analysis and experiments. To address this, we introduce FLAMES, a novel LLM-based APR technique that employs semantic-guided patch generation to enhance repair effectiveness and memory efficiency. Unlike conventional methods that rely on beam search, FLAMES utilizes greedy decoding to enhance memory efficiency while steering the search towards more potentially good repair candidates via a semantic-guided best-first search algorithm. At each decoding step, FLAMES uses semantic feedback from test validation, such as the number of passing and failing test cases, to select the most promising token to explore further. Our empirical evaluation on Defects4J shows thatFLAMES substantially reduces memory consumption by up to 83% compared to LLM-based APR without compromising time efficiency. Moreover, FLAMES correctly fixes 133 bugs on Defects4J, fixing 10 bugs more than the best baseline. Additionally, these improvements also generalize to the HumanEval-Java and TransformedD4J datasets, where FLAMES generates 12% and 36.5% more correct patches, respectively, than the best baseline.

翻译：本文首先指出，在基于大语言模型的自动程序修复中，即使对于小型大语言模型（10亿至70亿参数），增加束搜索的束宽也会显著增加GPU内存使用，导致高达80%的重复性崩溃源于内存过载。看似简单的降低内存消耗的解决方案包括：（1）对大语言模型进行量化，即将模型权重从高精度值转换为低精度值；（2）使束搜索顺序化，即依次将每个候选序列输入模型进行前向传播，然后将输出重新拼接。然而，我们通过理论分析和实验证明，这些方法仍然无法有效解决问题。为此，我们提出了FLAMES，一种新颖的基于大语言模型的自动程序修复技术，它采用语义引导的补丁生成来提升修复效果和内存效率。与传统依赖束搜索的方法不同，FLAMES利用贪心解码来提高内存效率，同时通过语义引导的最佳优先搜索算法引导搜索朝向更具潜力的修复候选。在每个解码步骤中，FLAMES利用来自测试验证的语义反馈（例如通过和失败的测试用例数量）来选择最有希望的令牌进行进一步探索。我们在Defects4J上的实证评估表明，与未优化的基于大语言模型的自动程序修复方法相比，FLAMES在保持时间效率的同时，将内存消耗大幅降低了高达83%。此外，FLAMES在Defects4J上正确修复了133个错误，比最佳基线多修复了10个错误。这些改进同样泛化至HumanEval-Java和TransformedD4J数据集，其中FLAMES分别比最佳基线多生成了12%和36.5%的正确补丁。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日