Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning

With the rapid discovery of emergent phenomena in deep learning and large language models, understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with stochastic gradient descent (SGD) and its variants. Building on the theory of parameter symmetries and an entropic loss landscape, we show that representation learning is crucially governed by emergent entropic forces arising from stochasticity and discrete-time updates. These forces systematically break continuous parameter symmetries and preserve discrete ones, leading to a series of gradient balance phenomena that resemble the equipartition property of thermal systems. These phenomena, in turn, (a) explain the universal alignment of neural representations between AI models and lead to a proof of the Platonic Representation Hypothesis, and (b) reconcile the seemingly contradictory observations of sharpness- and flatness-seeking behavior of deep learning optimization. Our theory and experiments demonstrate that a combination of entropic forces and symmetry breaking is key to understanding emergent phenomena in deep learning.

翻译：随着深度学习和大型语言模型中涌现现象的快速发现，理解其成因已成为迫切需求。本文提出了一种严格的熵力理论，用于理解通过随机梯度下降（SGD）及其变体训练的神经网络的学习动力学。基于参数对称性理论和熵损失景观，我们证明了表征学习本质上受随机性和离散时间更新所产生的涌现熵力支配。这些力系统地破坏连续参数对称性而保持离散对称性，导致一系列梯度平衡现象，类似于热力学系统的能量均分特性。这些现象进而（a）解释了人工智能模型间神经表征的普遍对齐，并导向了对柏拉图表征假设的证明；（b）调和了深度学习优化中看似矛盾的尖锐性与平坦性寻求行为。我们的理论与实验表明，熵力与对称性破缺的结合是理解深度学习中涌现现象的关键。

相关内容

表征学习

关注 152

在机器学习中，表征学习或表示学习是允许系统从原始数据中自动发现特征检测或分类所需的表示的一组技术。这取代了手动特征工程，并允许机器学习特征并使用它们执行特定任务。在有监督的表征学习中，使用标记的输入数据来学习特征，包括监督神经网络，多层感知器和（监督）字典学习。在无监督表征学习中，特征是与未标记的输入数据一起学习的，包括字典学习，独立成分分析，自动编码器，矩阵分解和各种形式的聚类。

美陆军研究报告《基于熵引导的深度神经网络加速收敛与性能提升方法》最新26页

专知会员服务

16+阅读 · 2025年7月3日

【斯坦福博士论文】神经网络中的特征学习及其他随机探索，238页pdf

专知会员服务

38+阅读 · 2024年7月12日

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

专知会员服务

10+阅读 · 2022年6月23日

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日