Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training. This phenomenon refers to the input weights of hidden neurons condensing into isolated orientations during training, revealing an implicit bias towards simple solutions in the parameter space. However, the impact of neural network structure on condensation has not been investigated yet. In this study, we focus on the investigation of convolutional neural networks (CNNs). Our experiments suggest that when subjected to small initialization and gradient-based training methods, kernel weights within the same CNN layer also cluster together during training, demonstrating a significant degree of condensation. Theoretically, we demonstrate that in a finite training period, kernels of a two-layer CNN with small initialization will converge to one or a few directions. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.
翻译:先前研究表明,采用小初始化与基于梯度的训练方法时,全连接网络在训练过程中会表现出一种称为凝聚的现象。该现象指隐藏层神经元的输入权重在训练中收敛至孤立方向,揭示了参数空间对简单解的内隐偏好。然而,神经网络结构对凝聚的影响尚未得到探究。本研究聚焦于卷积神经网络(CNN)的探究。实验表明,当采用小初始化与基于梯度的训练方法时,同一CNN层内的核权重在训练过程中也会发生聚类,展现出显著的凝聚程度。理论层面,我们证明在有限训练周期内,具有小初始化的两层CNN的核将收敛至一个或有限个方向。这项工作为理解具有特殊结构的神经网络所表现出的非线性训练行为迈出了关键一步。