为何个性化基于深度学习的代码补全工具至关重要 (Why Personalizing Deep Learning-Based Code Completion Tools Matters)

Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being $\sim$10$\times$ larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).

翻译：基于深度学习（DL）的代码补全工具通过实现高级代码生成，已经改变了软件开发方式。这些工具利用在大量代码库的海量代码上训练的模型，捕捉通用的编码模式。然而，针对特定组织或开发者对这些模型进行微调，以提升其在相关主体上的性能，其影响尚未得到充分探索。在本研究中，我们通过提供坚实的实证证据来回答这一问题，填补了这一空白。具体而言，我们考虑了来自两个组织（Apache 和 Spring）的 136 名开发者、两种模型架构（T5 和 Code Llama）以及三种模型规模（6000 万、7.5 亿和 70 亿可训练参数）。T5 模型（6000 万、7.5 亿）在超过 2000 个开源项目上进行了预训练和微调（排除了目标组织的数据），并与在组织特定和开发者特定数据集上微调的版本进行了比较。对于 Code Llama 模型（70 亿），我们比较了公开可用的已预训练模型的性能与通过参数高效微调在组织特定和开发者特定数据集上微调的同一模型的性能。我们的结果表明，通过组织特定和开发者特定的额外微调，模型的预测能力均得到提升，其中前者表现尤为突出。这一发现在以下两个方面具有普遍性：（i）两个目标组织（即 Apache 和 Spring），以及（ii）规模完全不同的模型（从 6000 万到 70 亿可训练参数）。最后，我们证明，在组织特定数据集上微调的深度学习模型，能够达到与开箱即用的预训练代码模型相同的补全性能，而后者的规模约为前者的 $\sim$10 倍，从而在部署和推理成本方面实现节约（例如，所需 GPU 更小）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日