Increase in data, size, or compute can lead to sudden learning of specific capabilities by a neural network -- a phenomenon often called "emergence". Beyond scientific understanding, establishing the causal factors underlying such emergent capabilities is crucial to enable risk regulation frameworks for AI. In this work, we seek inspiration from study of emergent properties in other fields and propose a phenomenological definition for the concept in the context of neural networks. Our definition implicates the acquisition of specific structures underlying the data-generating process as a cause of sudden performance growth for specific, narrower tasks. We empirically investigate this definition by proposing an experimental system grounded in a context-sensitive formal language and find that Transformers trained to perform tasks on top of strings from this language indeed exhibit emergent capabilities. Specifically, we show that once the language's underlying grammar and context-sensitivity inducing structures are learned by the model, performance on narrower tasks suddenly begins to improve. We then analogize our network's learning dynamics with the process of percolation on a bipartite graph, establishing a formal phase transition model that predicts the shift in the point of emergence observed in experiment when changing the data structure. Overall, our experimental and theoretical frameworks yield a step towards better defining, characterizing, and predicting emergence in neural networks.
翻译:数据、规模或计算量的增加可能导致神经网络突然习得特定能力——这一现象常被称为“涌现”。除科学理解外,确立此类涌现能力背后的因果因素对于构建人工智能风险调控框架至关重要。本研究从其他领域涌现特性的研究中汲取灵感,提出适用于神经网络背景下的涌现现象学定义。该定义将数据生成过程底层特定结构的习得,视为特定细分任务性能突然增长的原因。我们通过构建基于上下文敏感形式语言的实验系统进行实证研究,发现训练在此类语言字符串上执行任务的Transformer确实展现出涌现能力。具体而言,我们证明当模型习得该语言的底层语法及诱导上下文敏感的结构后,细分任务的性能开始突然提升。进而将网络学习动态类比为二分图上的渗流过程,建立形式相变模型以预测数据结构改变时涌现点的偏移。总体而言,我们的实验与理论框架为更好定义、表征和预测神经网络中的涌现现象迈出了重要一步。