The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
翻译:学习到的神经表征的效用取决于其几何结构如何支持下游任务的性能。这种几何结构取决于输入的结构、目标输出的结构以及网络的架构。通过研究具有一个隐藏层的网络的学习动态,我们发现了网络的激活函数对表征几何具有出乎意料的强烈影响:Tanh网络倾向于学习反映目标输出结构的表征,而ReLU网络则保留更多关于原始输入结构的信息。这种差异在一类广泛的参数化任务中一致观察到,在这些任务中我们调节了任务输入几何与任务标签几何之间的对齐程度。我们分析了权重空间中的学习动态,并展示了Tanh和ReLU非线性网络之间的差异如何源于ReLU的非对称渐近行为,这导致特征神经元专门化于输入空间的不同区域。相比之下,Tanh网络中的特征神经元倾向于继承任务标签结构。因此,当目标输出是低维时,Tanh网络产生的神经表征比使用ReLU非线性获得的表征更加解耦。我们的发现揭示了神经网络中输入-输出几何、非线性和学习表征之间的相互作用。