In the context of deep learning models, attention has recently been paid to studying the surface of the loss function in order to better understand training with methods based on gradient descent. This search for an appropriate description, both analytical and topological, has led to numerous efforts to identify spurious minima and characterize gradient dynamics. Our work aims to contribute to this field by providing a topological measure to evaluate loss complexity in the case of multilayer neural networks. We compare deep and shallow architectures with common sigmoidal activation functions by deriving upper and lower bounds on the complexity of their loss function and revealing how that complexity is influenced by the number of hidden units, training models, and the activation function used. Additionally, we found that certain variations in the loss function or model architecture, such as adding an $\ell_2$ regularization term or implementing skip connections in a feedforward network, do not affect loss topology in specific cases.
翻译:在深度学习模型的背景下,近期研究关注损失函数曲面的特性,以更好地理解基于梯度下降的训练方法。为寻求兼顾解析性与拓扑性的恰当描述,研究者已开展大量工作识别虚假极小值并刻画梯度动力学特征。本研究旨在通过提出一种拓扑测度来评估多层神经网络损失复杂性的方法,为这一领域做出贡献。我们通过推导损失函数复杂性的上下界,比较了采用常见S型激活函数的深层与浅层架构,揭示了隐藏单元数量、训练模式及激活函数如何影响该复杂性。此外,我们发现损失函数或模型架构的特定变化(如前馈网络中引入$\ell_2$正则化项或跳跃连接)在特定情形下不会改变损失拓扑结构。