Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks

Democratization of AI is an important topic within the broader topic of the digital divide. This issue is relevant to LLMs, which are becoming popular as AI co-pilots but suffer from a lack of accessibility due to high computational demand. In this study, we evaluate whether quantization is a viable approach toward enabling LLMs on generic consumer devices. The study assesses the performance of five quantized code LLMs in Lua code generation tasks. To evaluate the impact of quantization, the models with 7B parameters were tested on a consumer laptop at 2-, 4-, and 8-bit integer precisions and compared to non-quantized code LLMs with 1.3, 2, and 3 billion parameters. Lua is chosen as a low-level resource language to avoid models' biases related to high-resource languages. The results suggest that the models quantized at the 4-bit integer precision offer the best trade-off between performance and model size. These models can be comfortably deployed on an average laptop without a dedicated GPU. The performance significantly drops at the 2-bit integer precision. The models at 8-bit integer precision require more inference time that does not effectively translate to better performance. The 4-bit models with 7 billion parameters also considerably outperform non-quantized models with lower parameter numbers despite having comparable model sizes with respect to storage and memory demand. While quantization indeed increases the accessibility of smaller LLMs with 7 billion parameters, these LLMs demonstrate overall low performance (less than 50\%) on high-precision and low-resource tasks such as Lua code generation. While accessibility is improved, usability is still not at the practical level comparable to foundational LLMs such as GPT-4o or Llama 3.1 405B.

翻译：人工智能民主化是数字鸿沟这一更广泛议题中的一个重要主题。该问题与大型语言模型（LLMs）密切相关，这些模型正作为AI协作者日益普及，但因计算需求高昂而面临可访问性不足的困境。本研究评估量化是否是在通用消费设备上部署LLMs的可行途径。研究通过Lua代码生成任务，对五种量化代码LLMs的性能进行评估。为分析量化影响，我们在消费级笔记本电脑上测试了参数量为70亿的模型在2位、4位和8位整数精度下的表现，并与参数量分别为13亿、20亿和30亿的非量化代码LLMs进行对比。选择Lua作为低资源语言旨在规避模型对高资源语言的固有偏好。结果表明，4位整数精度量化的模型在性能与模型大小之间实现了最佳平衡，这类模型可流畅部署于无专用GPU的普通笔记本电脑。2位整数精度下模型性能显著下降，而8位整数精度模型需要更长的推理时间却未带来相应的性能提升。尽管在存储和内存需求方面模型规模相近，70亿参数的4位量化模型仍显著优于参数更少的非量化模型。虽然量化确实提升了70亿参数中小型LLMs的可访问性，但这类模型在Lua代码生成等高精度低资源任务中整体表现仍不理想（正确率低于50%）。尽管可访问性得到改善，其实用性仍未达到与GPT-4o或Llama 3.1 405B等基础LLMs相当的实践水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日