Large pre-trained models of code have been adopted to tackle many software engineering tasks and achieved excellent results. However, their large model size and expensive energy consumption prevent them from being widely deployed on developers' computers to provide real-time assistance. A recent study by Shi et al. can compress the pre-trained models into a small size. However, other important considerations in deploying models to have not been addressed: the model should have fast inference speed and minimal energy consumption. This requirement motivates us to propose Avatar, a novel approach that can reduce the model size as well as inference latency and energy consumption without compromising effectiveness (i.e., prediction accuracy). Avatar trains a surrogate model to predict the performance of a tiny model given only its hyperparameters setting. Moreover, Avatar designs a new fitness function embedding multiple key objectives, maximizing the predicted model accuracy and minimizing the model size, inference latency, and energy consumption. After finding the best model hyperparameters using a tailored genetic algorithm (GA), Avatar employs the knowledge distillation technique to train the tiny model. We evaluate Avatar and the baseline approach from Shi et al. on three datasets for two popular software engineering tasks: vulnerability prediction and clone detection. We use Avatar to compress models to a small size (3 MB), which is 160$\times$ smaller than the original pre-trained models. Compared with the original models, the inference latency of compressed models is significantly reduced on all three datasets. On average, our approach is capable of reducing the inference latency by 62$\times$, 53$\times$, and 186$\times$. In terms of energy consumption, compressed models only have 0.8 GFLOPs, which is 173$\times$ smaller than the original pre-trained models.
翻译:大型预训练代码模型已被应用于解决许多软件工程任务,并取得了出色的成果。然而,它们庞大的模型尺寸和高昂的能耗阻碍了其在开发者计算机上广泛部署以提供实时辅助。Shi等人最近的研究能够将预训练模型压缩至小尺寸,但模型部署中的其他重要考量尚未得到解决:模型应具备快速推理速度和最低能耗。这一需求促使我们提出Avatar,一种能在不损害有效性(即预测准确率)的前提下,同时减小模型尺寸、推理延迟和能耗的新方法。Avatar训练一个代理模型,仅根据超参数设置即可预测小型模型的性能。此外,Avatar设计了一种嵌入多个关键目标的适应度函数,最大化预测模型准确率的同时最小化模型尺寸、推理延迟和能耗。通过使用定制遗传算法找到最佳模型超参数后,Avatar采用知识蒸馏技术训练小型模型。我们在两个流行的软件工程任务(漏洞预测和克隆检测)的三个数据集上评估了Avatar与Shi等人的基线方法。我们使用Avatar将模型压缩至小尺寸(3MB),比原始预训练模型小160倍。与原始模型相比,压缩模型在所有三个数据集上的推理延迟均显著降低。平均而言,我们的方法能够将推理延迟减少62倍、53倍和186倍。在能耗方面,压缩模型仅需0.8 GFLOPs,比原始预训练模型小173倍。