Greener yet Powerful: Taming Large Code Generation Models with Quantization

Xiaokai Wei,Sujan Gonugondla,Wasi Ahmad,Shiqi Wang,Baishakhi Ray,Haifeng Qian,Xiaopeng Li,Varun Kumar,Zijian Wang,Yuchen Tian,Qing Sun,Ben Athiwaratkun,Mingyue Shang,Murali Krishna Ramanathan,Parminder Bhatia,Bing Xiang

from arxiv, 10 pages, 7 figures, 10 tables

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.

翻译：基于机器学习的代码生成旨在通过根据自然语言提示智能生成代码块，以更高效的方式帮助开发人员编写代码。近期，大型预训练深度学习模型极大推动了代码生成的发展，并取得了令人瞩目的性能。然而，尽管这些模型功能强大，其庞大的参数量却对在常规软件开发环境（例如，开发人员使用标准笔记本电脑或中型服务器编写代码）中部署它们构成了重大挑战。此类大型模型会带来显著的资源消耗（包括内存、延迟和成本）以及碳足迹。模型压缩是应对这些挑战的一种有前景的方法。已有多种技术被提出用于压缩通常用于处理视觉或文本数据的大型预训练模型。在众多可用的压缩技术中，我们识别出量化技术最适用于代码生成任务，因为它不需要高昂的重新训练成本。由于量化使用较低比特的整数（例如，int8）来表示模型参数，模型大小和运行时延迟都会从这种整数表示中受益。我们全面研究了量化模型对代码生成任务在以下不同维度的影响：（i）资源使用和碳足迹，（ii）准确性，以及（iii）鲁棒性。为此，通过系统性的实验，我们发现了一种量化技术的方案，该方案甚至可以在常规笔记本电脑上运行一个60亿参数的模型，而不会显著降低准确性或鲁棒性。我们进一步发现，该方案也易于应用于代码摘要任务。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日