ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.
翻译:基于机器学习的代码生成旨在通过根据自然语言提示智能生成代码块,以更高效的方式帮助开发人员编写代码。近期,大型预训练深度学习模型极大推动了代码生成的发展,并取得了令人瞩目的性能。然而,尽管这些模型功能强大,其庞大的参数量却对在常规软件开发环境(例如,开发人员使用标准笔记本电脑或中型服务器编写代码)中部署它们构成了重大挑战。此类大型模型会带来显著的资源消耗(包括内存、延迟和成本)以及碳足迹。模型压缩是应对这些挑战的一种有前景的方法。已有多种技术被提出用于压缩通常用于处理视觉或文本数据的大型预训练模型。在众多可用的压缩技术中,我们识别出量化技术最适用于代码生成任务,因为它不需要高昂的重新训练成本。由于量化使用较低比特的整数(例如,int8)来表示模型参数,模型大小和运行时延迟都会从这种整数表示中受益。我们全面研究了量化模型对代码生成任务在以下不同维度的影响:(i)资源使用和碳足迹,(ii)准确性,以及(iii)鲁棒性。为此,通过系统性的实验,我们发现了一种量化技术的方案,该方案甚至可以在常规笔记本电脑上运行一个60亿参数的模型,而不会显著降低准确性或鲁棒性。我们进一步发现,该方案也易于应用于代码摘要任务。