Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external transmission of code to third-party vendors, and more. Therefore, developing a compact, efficient, and yet energy-saving model for deployment on developers' devices becomes essential. To this aim, we propose Avatar, a novel approach that crafts a deployable model from a large language model of code by optimizing it in terms of model size, inference latency, energy consumption, and carbon footprint while maintaining a comparable level of effectiveness. The key idea of Avatar is to formulate the optimization of language models as a multi-objective configuration tuning problem and solve it with the help of a Satisfiability Modulo Theories (SMT) solver and a tailored optimization algorithm. The SMT solver is used to form an appropriate configuration space, while the optimization algorithm identifies the Pareto-optimal set of configurations for training the optimized models using knowledge distillation. We evaluate Avatar with two popular language models of code, i.e., CodeBERT and GraphCodeBERT, on two popular tasks, i.e., vulnerability prediction and clone detection. We use Avatar to produce optimized models with a small size (3 MB), which is 160$\times$ smaller than the original large models. On the two tasks, the optimized models significantly reduce the energy consumption (up to 184$\times$ less), carbon footprint (up to 157$\times$ less), and inference latency (up to 76$\times$ faster), with only a negligible loss in effectiveness (1.67\% on average).
翻译:代码大规模语言模型在各种软件工程任务中已展现出显著有效性。尽管基于这些强大模型构建的云服务已广泛可用,但在诸多场景下,开发人员仍无法充分利用它们——原因包括受限或不可靠的网络访问、机构隐私政策禁止将代码传输至第三方供应商等。因此,开发一种紧凑、高效且节能的模型以部署在开发人员设备上变得至关重要。为此,我们提出Avatar——一种新颖方法,通过优化模型尺寸、推理延迟、能耗和碳足迹,同时保持相当的效能,从代码大规模语言模型中构建可部署模型。Avatar的核心思想是将语言模型优化形式化为多目标配置调优问题,并通过可满足性模理论(SMT)求解器与定制优化算法协同求解。SMT求解器用于构建合适的配置空间,而优化算法则识别帕累托最优配置集,以通过知识蒸馏训练优化模型。我们基于两个主流代码语言模型(CodeBERT和GraphCodeBERT)在漏洞预测和克隆检测两项热门任务上评估Avatar。通过Avatar生成的优化模型尺寸仅3 MB,比原始大模型缩小160倍。在两项任务中,优化模型显著降低了能耗(最高减少184倍)、碳足迹(最高减少157倍)和推理延迟(最高提速76倍),而效能损失可忽略不计(平均1.67%)。