A mathematical theory for understanding when abstract representations emerge in neural networks

Recent experiments in neuroscience reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of neural population activity. These disentangled, or abstract, representations have been observed in multiple brain areas and across different species. These representations have been shown to support out of distribution generalization and rapid learning of novel tasks. The mechanisms by which these representations emerge remain poorly understood, especially in the case of supervised task behavior. Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These learned abstract representations reflect the semantics of the input stimuli. To show this, we reformulate the usual optimization over the network weights into a mean field optimization problem over the distribution of neural preactivations. We then apply this framework to finite-width ReLU networks and show that the hidden layer of these networks will exhibit an abstract representation at all global minima of the task objective. Finally, we extend our findings to two broad families of activation functions as well as deep feedforward architectures. Together, our results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks. In addition, the general framework that we develop here provides a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.

翻译：神经科学近期实验表明，任务相关变量常被编码于神经群体活动的近似正交子空间中。这种解纠缠的（或称抽象的）表征已在多个脑区和不同物种中被观测到。研究证明这些表征能够支持分布外泛化和新任务的快速学习。然而，这些表征的形成机制，特别是在监督任务行为中，仍不甚明晰。本文通过数学证明表明：当前馈非线性网络在直接依赖于潜在变量的任务上进行训练时，其隐藏层必然会出现潜在变量的抽象表征。这些习得的抽象表征反映了输入刺激的语义特征。为证明此结论，我们将通常针对网络权重的优化问题重新表述为关于神经预激活分布的平均场优化问题。随后将该框架应用于有限宽度ReLU网络，证明这些网络的隐藏层在任务目标的所有全局极小值点都会呈现抽象表征。最后，我们将研究结果拓展至两大激活函数族以及深度前馈架构。综合而言，我们的研究结果为大脑和人工神经网络中广泛观测到的抽象表征提供了理论解释。此外，本文构建的通用框架为理解任务优化、特征学习网络模型中各类表征的形成机制提供了数学可处理的工具集。