Quantifying the Capabilities of LLMs across Scale and Precision

Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.

翻译：规模通常被视为提升大语言模型性能的关键因素之一，催生了拥有数十亿乃至万亿参数量的模型。然而，这类大型模型的显著局限在于其高昂的计算需求，制约了其在资源受限场景下的使用、部署与调试。为突破这些限制，两种常见替代方案是采用小版本大模型（如以Llama 7B替代Llama 70B）以及通过量化技术降低内存需求。尽管这些方法有效缓解了资源约束，但其对模型性能的影响仍需深入探究。本研究通过系统性评估，考察模型规模与量化机制对性能的交互影响，选取参数量从70亿到700亿的两大主流开源指令模型进行实验。我们针对自然语言理解、推理、虚假信息检测及幻觉抑制等多项任务开展零样本测试，结果表明大模型普遍优于小模型，证实规模仍是提升性能的关键因素。研究发现，大模型对精度压缩具有显著鲁棒性：在多数任务上，即使采用4比特量化仍能保持高准确率。在同等内存约束条件下，采用高精度量化的大模型相比高精度小模型更具性能优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日