Quantifying the Capabilities of LLMs across Scale and Precision

Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.

翻译：通常认为规模是导致大语言模型性能提升的因素之一，由此产生了具有数十亿乃至数万亿参数的模型。这类大型模型的局限性之一在于其高计算需求，在资源受限场景下限制其使用、部署和调试。为规避这些限制，两种常用方法是使用更小版本的大语言模型（例如使用Llama 7B替代Llama 70B）以及通过量化降低内存需求。虽然这些方法有效应对了资源限制问题，但它们对模型性能的影响需要深入审视。本研究开展全面评估，考察模型尺度和量化对性能的影响。我们实验了两种主要开源指令模型系列，参数规模从70亿到700亿不等。涵盖自然语言理解、推理、虚假信息检测和幻觉等任务的广泛零样本实验表明，大模型通常优于其小规模对应模型，表明规模仍是提升性能的重要因素。我们发现大模型对精度降低具有卓越鲁棒性，在多项任务中即使采用4比特量化仍能保持高精度；在相似内存需求下，采用大模型较低精度配置的效果优于采用高精度小模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日