Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.
翻译:规模通常被视为提升大语言模型性能的关键因素之一,催生了拥有数十亿乃至万亿参数量的模型。然而,这类大型模型的显著局限在于其高昂的计算需求,制约了其在资源受限场景下的使用、部署与调试。为突破这些限制,两种常见替代方案是采用小版本大模型(如以Llama 7B替代Llama 70B)以及通过量化技术降低内存需求。尽管这些方法有效缓解了资源约束,但其对模型性能的影响仍需深入探究。本研究通过系统性评估,考察模型规模与量化机制对性能的交互影响,选取参数量从70亿到700亿的两大主流开源指令模型进行实验。我们针对自然语言理解、推理、虚假信息检测及幻觉抑制等多项任务开展零样本测试,结果表明大模型普遍优于小模型,证实规模仍是提升性能的关键因素。研究发现,大模型对精度压缩具有显著鲁棒性:在多数任务上,即使采用4比特量化仍能保持高准确率。在同等内存约束条件下,采用高精度量化的大模型相比高精度小模型更具性能优势。