Model Editing for LLMs4Code: How Far are We?

Large Language Models for Code (LLMs4Code) have been found to exhibit outstanding performance in the software engineering domain, especially the remarkable performance in coding tasks. However, even the most advanced LLMs4Code can inevitably contain incorrect or outdated code knowledge. Due to the high cost of training LLMs4Code, it is impractical to re-train the models for fixing these problematic code knowledge. Model editing is a new technical field for effectively and efficiently correcting erroneous knowledge in LLMs, where various model editing techniques and benchmarks have been proposed recently. Despite that, a comprehensive study that thoroughly compares and analyzes the performance of the state-of-the-art model editing techniques for adapting the knowledge within LLMs4Code across various code-related tasks is notably absent. To bridge this gap, we perform the first systematic study on applying state-of-the-art model editing approaches to repair the inaccuracy of LLMs4Code. To that end, we introduce a benchmark named CLMEEval, which consists of two datasets, i.e., CoNaLa-Edit (CNLE) with 21K+ code generation samples and CodeSearchNet-Edit (CSNE) with 16K+ code summarization samples. With the help of CLMEEval, we evaluate six advanced model editing techniques on three LLMs4Code: CodeLlama (7B), CodeQwen1.5 (7B), and Stable-Code (3B). Our findings include that the external memorization-based GRACE approach achieves the best knowledge editing effectiveness and specificity (the editing does not influence untargeted knowledge), while generalization (whether the editing can generalize to other semantically-identical inputs) is a universal challenge for existing techniques. Furthermore, building on in-depth case analysis, we introduce an enhanced version of GRACE called A-GRACE, which incorporates contrastive learning to better capture the semantics of the inputs.

翻译：面向代码的大语言模型（LLMs4Code）已被发现在软件工程领域展现出卓越性能，尤其在代码生成任务中表现突出。然而，即使是最先进的LLMs4Code也难免包含错误或过时的代码知识。由于训练LLMs4Code的成本高昂，通过重新训练模型来修正这些有问题的代码知识并不现实。模型编辑是一个旨在高效、有效地纠正大语言模型中错误知识的新兴技术领域，近期已涌现出多种模型编辑技术和基准测试。尽管如此，目前仍缺乏一项全面研究，以系统性地比较和分析最先进的模型编辑技术在适应LLMs4Code内部知识、应对各类代码相关任务时的性能表现。为填补这一空白，我们首次系统性地研究了应用最先进的模型编辑方法来修复LLMs4Code中的知识不准确性问题。为此，我们引入了一个名为CLMEEval的基准测试，它包含两个数据集：CoNaLa-Edit（CNLE），包含超过21,000个代码生成样本；以及CodeSearchNet-Edit（CSNE），包含超过16,000个代码摘要样本。借助CLMEEval，我们在三个LLMs4Code模型上评估了六种先进的模型编辑技术：CodeLlama（7B）、CodeQwen1.5（7B）和Stable-Code（3B）。我们的研究发现包括：基于外部记忆的GRACE方法在知识编辑的有效性和特异性（编辑不影响非目标知识）方面表现最佳，而泛化能力（编辑能否推广到其他语义相同的输入）则是现有技术普遍面临的挑战。此外，基于深入的案例分析，我们提出了GRACE的增强版本A-GRACE，它引入了对比学习以更好地捕捉输入的语义信息。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日