Residual connections are one of the most important components in neural network architectures for mitigating the vanishing gradient problem and facilitating the training of much deeper networks. One possible explanation for how residual connections aid deeper network training is by promoting feature reuse. However, we identify and analyze the limitations of feature reuse with vanilla residual connections. To address these limitations, we propose modifications in training methods. Specifically, we provide an additional opportunity for the model to learn feature reuse with residual connections through two types of iterations during training. The first type of iteration involves using droppath, which enforces feature reuse by randomly dropping a subset of layers. The second type of iteration focuses on training the dropped parts of the model while freezing the undropped parts. As a result, the dropped parts learn in a way that encourages feature reuse, as the model relies on the undropped parts with feature reuse in mind. Overall, we demonstrated performance improvements in models with residual connections for image classification in certain cases.
翻译:残差连接是神经网络架构中最重要的组件之一,用于缓解梯度消失问题并促进更深层网络的训练。残差连接有助于更深层网络训练的一种可能解释是它们促进了特征复用。然而,我们识别并分析了使用普通残差连接时特征复用的局限性。为了解决这些局限性,我们提出了训练方法的改进。具体而言,我们通过训练期间的两种迭代类型,为模型提供了额外的机会来学习利用残差连接进行特征复用。第一种迭代类型涉及使用droppath,它通过随机丢弃一部分层来强制特征复用。第二种迭代类型侧重于训练模型中被丢弃的部分,同时冻结未被丢弃的部分。因此,被丢弃的部分以一种鼓励特征复用的方式学习,因为模型在考虑特征复用的前提下依赖于未被丢弃的部分。总体而言,我们在某些情况下证明了具有残差连接的模型在图像分类任务上取得了性能提升。