Large language models have emergent capabilities that come unexpectedly at scale, but we need a theoretical framework to explain why and how they emerge. We prove that language models are actually non-ergodic systems while providing a mathematical framework based on Stuart Kauffman's theory of the adjacent possible (TAP) to explain capability emergence. Our resource-constrained TAP equation demonstrates how architectural, training, and contextual constraints interact to shape model capabilities through phase transitions in semantic space. We prove through experiments with three different language models that capacities emerge through discrete transitions guided by constraint interactions and path-dependent exploration. This framework provides a theoretical basis for understanding emergence in language models and guides the development of architectures that can guide capability emergence.
翻译:大语言模型具有在规模扩大时意外出现的涌现能力,但我们需要一个理论框架来解释这些能力为何以及如何涌现。我们证明语言模型实际上是非遍历系统,同时基于斯图尔特·考夫曼的邻近可能理论(TAP)提出了一个数学框架来解释能力涌现。我们的资源受限TAP方程展示了架构约束、训练约束和上下文约束如何通过语义空间的相变相互作用来塑造模型能力。我们通过对三种不同语言模型的实验证明,能力是通过约束相互作用和路径依赖探索引导的离散跃迁而涌现的。该框架为理解语言模型中的涌现现象提供了理论基础,并能够指导可引导能力涌现的架构开发。