Are Emergent Abilities of Large Language Models a Mirage?

Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities; (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

翻译：近期研究声称，大型语言模型展现出涌现能力——即较小规模模型中不存在、而较大规模模型中呈现的能力。涌现能力之所以引人关注，源于两个特征：其尖锐性（似乎从不存在到存在的瞬间转变）与不可预测性（在看似无法预见的模型规模下突然出现）。在此，我们提出对涌现能力的替代性解释：针对特定任务与模型族，在分析固定模型输出时，涌现能力的出现源于研究者选择的评估指标，而非源于模型行为随规模发生的根本性变化。具体而言，非线性或不连续指标会产生表面上的涌现能力，而线性或连续指标则呈现模型性能平滑、连续且可预测的变化。我们通过一个简洁的数学模型阐释该替代性解释，并通过三种互补方式加以验证：（1）基于InstructGPT/GPT-3模型族，针对声称具有涌现能力的任务，提出、验证并确认关于指标选择的三个预测；（2）在BIG-Bench涌现能力的元分析中，提出、验证并确认关于指标选择的两个预测；（3）展示如何通过选择指标，在多种视觉任务与不同深度网络中制造前所未见、看似涌现的能力。通过这三项分析，我们提供证据表明：所谓的涌现能力在更换指标或优化统计方法后会消失，可能并非扩展AI模型的基本属性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日