Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external transmission of code to third-party vendors, and more. Therefore, developing a compact, efficient, and yet energy-saving model for deployment on developers' devices becomes essential. To this aim, we propose Avatar, a novel approach that crafts a deployable model from a large language model of code by optimizing it in terms of model size, inference latency, energy consumption, and carbon footprint while maintaining a comparable level of effectiveness. The key idea of Avatar is to formulate the optimization of language models as a multi-objective configuration tuning problem and solve it with the help of a Satisfiability Modulo Theories (SMT) solver and a tailored optimization algorithm. The SMT solver is used to form an appropriate configuration space, while the optimization algorithm identifies the Pareto-optimal set of configurations for training the optimized models using knowledge distillation. We evaluate Avatar with two popular language models of code, i.e., CodeBERT and GraphCodeBERT, on two popular tasks, i.e., vulnerability prediction and clone detection. We use Avatar to produce optimized models with a small size (3 MB), which is 160$\times$ smaller than the original large models. On the two tasks, the optimized models significantly reduce the energy consumption (up to 184$\times$ less), carbon footprint (up to 157$\times$ less), and inference latency (up to 76$\times$ faster), with only a negligible loss in effectiveness (1.67\% on average).
翻译:代码大型语言模型在各类软件工程任务中展现出卓越效能。尽管已有众多基于这些强大模型构建的云服务可用,但开发者在多个场景中仍无法充分利用这些模型,原因包括网络访问受限或不可靠、机构隐私政策禁止将代码传输给第三方供应商等。因此,开发一种紧凑、高效且节能的模型以部署在开发者设备上变得至关重要。为此,我们提出Avatar方法——一种从代码大型语言模型中通过优化模型规模、推理延迟、能耗和碳足迹,同时保持可比效果来构建可部署模型的新方法。Avatar的核心思想是将语言模型优化形式化为多目标配置调优问题,并借助可满足性模理论(SMT)求解器与定制化优化算法解决该问题。SMT求解器用于构建合适的配置空间,而优化算法则识别帕累托最优配置集,以通过知识蒸馏训练优化模型。我们利用两种主流的代码语言模型CodeBERT和GraphCodeBERT,在漏洞预测和克隆检测两项任务中评估Avatar。通过Avatar生成的优化模型仅3MB,比原始大型模型缩小160倍。在两项任务中,优化模型显著降低了能耗(最高降低184倍)、碳足迹(最高降低157倍)和推理延迟(最高加速76倍),而效能损失极小(平均仅1.67%)。