Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as "emergent abilities," have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.
翻译:拥有数十亿参数并在大规模网络语料库上预训练的大语言模型,据称能够获得某些未经专门训练的能力。这些被称为"涌现能力"的特性,一直是推动语言模型潜力与风险讨论的重要动力。评估涌现能力的一个关键挑战在于,它们与通过其他提示技术产生的模型能力相混淆,包括上下文学习——即模型根据少量示例完成任务的能力。我们提出了一种解释涌现能力的新理论,该理论考虑了其潜在的混淆因素,并通过超过1000次实验严格验证了这一理论。我们的研究结果表明,所谓的涌现能力并非真正涌现,而是上下文学习、模型记忆和语言知识共同作用的结果。我们的工作为解释语言模型性能奠定了理论基础,为其高效使用提供了模板,并阐明了它们在某些情况下表现出色而在其他情况下却表现不佳的矛盾现象。因此,我们证明不应高估其能力。