Single-Nodal Spontaneous Symmetry Breaking in NLP Models

Spontaneous symmetry breaking in statistical mechanics primarily occurs during phase transitions at the thermodynamic limit where the Hamiltonian preserves inversion symmetry, yet the low-temperature free energy exhibits reduced symmetry. Herein, we demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models during both pre-training and fine-tuning, even under deterministic dynamics and within a finite training architecture. This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes and also valid at a single-nodal level, where nodes acquire the capacity to learn a limited set of tokens after pre-training or labels after fine-tuning for a specific classification task. As the number of nodes increases, a crossover in learning ability occurs, governed by the tradeoff between a decrease following random-guess among increased possible outputs, and enhancement following nodal cooperation, which exceeds the sum of individual nodal capabilities. In contrast to spin-glass systems, where a microscopic state of frozen spins cannot be directly linked to the free-energy minimization goal, each nodal function in this framework contributes explicitly to the global network task and can be upper-bounded using convex hull analysis. Results are demonstrated using BERT-6 architecture pre-trained on Wikipedia dataset and fine-tuned on the FewRel classification task.

翻译：统计力学中的自发对称性破缺主要发生在热力学极限下的相变过程中，此时哈密顿量保持反演对称性，但低温下的自由能却表现出降低的对称性。本文证明了在自然语言处理（NLP）模型中，即使在确定性动力学和有限训练架构下，于预训练和微调阶段也会出现自发对称性破缺现象。这一现象发生在单个注意力头的层面，可缩简至其节点的小型子集，并且在单节点层面同样成立——节点在预训练后获得学习有限标记集的能力，或在针对特定分类任务微调后获得学习标签的能力。随着节点数量的增加，学习能力会出现交叉转变，其调控机制在于两种趋势的权衡：一方面，随着可能输出数量的增加，随机猜测导致的性能下降；另一方面，节点协作带来的性能提升，这种协作效应超越了各节点能力的简单加和。与自旋玻璃系统不同（其微观自旋冻结状态无法直接关联到自由能最小化目标），本框架中每个节点的功能都明确贡献于全局网络任务，并可通过凸包分析进行上界估计。研究结果通过基于维基百科数据集预训练、并在FewRel分类任务上微调的BERT-6架构进行了验证。

相关内容

对称性破缺

关注 2

对称性破缺是一个跨物理学、生物学、社会学与系统论等学科的概念，狭义简单理解为对称元素的丧失；也可理解为原来具有较高对称性的系统，出现不对称因素，其对称程度自发降低的现象。对称破缺是事物差异性的方式，任何的对称都一定存在对称破缺。对称性是普遍存在于各个尺度下的系统中，有对称性的存在，就必然存在对称性的破缺。对称性破缺也是量子场论的重要概念，指理论的对称性为真空所破坏，对探索宇宙的本原有重要意义。它包含“自发对称性破缺”和“动力学对称性破缺”两种情形。

视觉自回归模型综述

专知会员服务

45+阅读 · 2024年11月15日

如何处理模态缺失？首篇《缺失模态的深度多模态学习》全面综述

专知会员服务

31+阅读 · 2024年9月13日

大型语言模型遇上自然语言处理：综述

专知会员服务

38+阅读 · 2024年5月23日

【牛津大学博士论文】自然语言处理的鲁棒性研究，194页pdf

专知会员服务

26+阅读 · 2024年2月26日