Smaller, Faster, Greener: Compressing Pre-trained Code Models via Surrogate-Assisted Optimization

Large pre-trained models of code have been adopted to tackle many software engineering tasks and achieved excellent results. However, their large model size and expensive energy consumption prevent them from being widely deployed on developers' computers to provide real-time assistance. A recent study by Shi et al. can compress the pre-trained models into a small size. However, other important considerations in deploying models to have not been addressed: the model should have fast inference speed and minimal energy consumption. This requirement motivates us to propose Avatar, a novel approach that can reduce the model size as well as inference latency and energy consumption without compromising effectiveness (i.e., prediction accuracy). Avatar trains a surrogate model to predict the performance of a tiny model given only its hyperparameters setting. Moreover, Avatar designs a new fitness function embedding multiple key objectives, maximizing the predicted model accuracy and minimizing the model size, inference latency, and energy consumption. After finding the best model hyperparameters using a tailored genetic algorithm (GA), Avatar employs the knowledge distillation technique to train the tiny model. We evaluate Avatar and the baseline approach from Shi et al. on three datasets for two popular software engineering tasks: vulnerability prediction and clone detection. We use Avatar to compress models to a small size (3 MB), which is 160$\times$ smaller than the original pre-trained models. Compared with the original models, the inference latency of compressed models is significantly reduced on all three datasets. On average, our approach is capable of reducing the inference latency by 62$\times$, 53$\times$, and 186$\times$. In terms of energy consumption, compressed models only have 0.8 GFLOPs, which is 173$\times$ smaller than the original pre-trained models.

翻译：大型预训练代码模型已被应用于解决许多软件工程任务，并取得了出色的成果。然而，它们庞大的模型尺寸和高昂的能耗阻碍了其在开发者计算机上广泛部署以提供实时辅助。Shi等人最近的研究能够将预训练模型压缩至小尺寸，但模型部署中的其他重要考量尚未得到解决：模型应具备快速推理速度和最低能耗。这一需求促使我们提出Avatar，一种能在不损害有效性（即预测准确率）的前提下，同时减小模型尺寸、推理延迟和能耗的新方法。Avatar训练一个代理模型，仅根据超参数设置即可预测小型模型的性能。此外，Avatar设计了一种嵌入多个关键目标的适应度函数，最大化预测模型准确率的同时最小化模型尺寸、推理延迟和能耗。通过使用定制遗传算法找到最佳模型超参数后，Avatar采用知识蒸馏技术训练小型模型。我们在两个流行的软件工程任务（漏洞预测和克隆检测）的三个数据集上评估了Avatar与Shi等人的基线方法。我们使用Avatar将模型压缩至小尺寸（3MB），比原始预训练模型小160倍。与原始模型相比，压缩模型在所有三个数据集上的推理延迟均显著降低。平均而言，我们的方法能够将推理延迟减少62倍、53倍和186倍。在能耗方面，压缩模型仅需0.8 GFLOPs，比原始预训练模型小173倍。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日