Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.
翻译:神经网络具备生成任务相关特征有意义表示的关键能力。实际上,通过适当的缩放,神经网络中的监督学习可以产生强大的、任务相关的特征学习。然而,这种涌现表示的本质——我们称之为“编码方案”——仍不明确。为理解涌现的编码方案,我们使用贝叶斯框架研究了全连接宽神经网络学习分类任务的过程,其中学习塑造了网络权重的后验分布。与先前研究一致,我们对特征学习机制(也称为“非惰性”、“丰富”或“平均场”机制)的分析表明,网络获得了强大的、数据依赖的特征。令人惊讶的是,内部表示的本质关键取决于神经元非线性。在线性网络中,任务会出现模拟编码方案。尽管表示能力强大,其均值预测器仍与惰性情况相同。在非线性网络中,自发对称破缺会导致冗余或稀疏的编码方案。我们的研究结果突显了网络属性(如权重缩放和神经元非线性)如何深刻影响涌现的表示。