Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.
翻译:深度神经网络具有多种规模和架构。通常认为,架构选择与数据集及学习算法共同影响所学得的神经表征。然而,近期研究结果表明,不同架构学习到的表征具有显著的定性相似性。本文在编码映射(从输入到隐藏表征)与解码映射(从表征到输出)均为任意光滑函数的假设下,推导出表征学习的有效理论。该理论在复杂大型架构的范畴内构建了表征学习动态的框架,其中隐藏表征不受参数化的强约束。通过实验证明,该有效理论能够描述具有不同激活函数和架构的深度网络在表征学习动态中的若干特征,并展现出与"丰富"和"惰性"机制相似的现象。尽管许多网络行为在定量层面依赖于具体架构,但我们的研究指出:一旦模型具备足够的灵活性,某些行为将具有广泛的守恒性。