Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

Residual networks have shown great success and become indispensable in recent deep neural network models. In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits. Previous research has suggested that residual networks can be considered as ensembles of shallow networks, which implies that the final performance of a residual network is influenced by a group of subnetworks. We identify a previously overlooked problem that is analogous to social loafing, where subnetworks within a residual network are prone to exert less effort when working as part of a group compared to working alone. We define this problem as \textit{network loafing}. Similar to the decreased individual productivity and overall performance as demonstrated in society, network loafing inevitably causes sub-par performance. Inspired by solutions from social psychology, we first propose a novel training scheme called stimulative training, which randomly samples a residual subnetwork and calculates the KL divergence loss between the sampled subnetwork and the given residual network for extra supervision. In order to unleash the potential of stimulative training, we further propose three simple-yet-effective strategies, including a novel KL- loss that only aligns the network logits direction, random smaller inputs for subnetworks, and inter-stage sampling rules. Comprehensive experiments and analysis verify the effectiveness of stimulative training as well as its three improved strategies.

翻译：残差网络已在近期深度神经网络模型中取得巨大成功并成为不可或缺的组成部分。本研究旨在从社会心理学中"社会惰化"这一新颖视角重新审视残差网络的训练过程，并提出一种新的训练方案及三种改进策略，以推动残差网络突破其性能极限。已有研究表明，残差网络可视为浅层网络的集成，这意味着残差网络的最终性能受一组子网络的影响。我们发现了一个此前被忽视的问题——类似于社会惰化现象，即残差网络中的子网络在作为群体协作时比单独工作更易降低投入程度。我们将此问题定义为"网络惰化"。与社会中展现的个体生产力及整体绩效下降类似，网络惰化不可避免地导致性能欠佳。受社会心理学解决方案启发，我们首先提出一种名为激励训练的新型训练方案——随机采样残差子网络，并计算该采样子网络与给定残差网络之间的KL散度损失以实现额外监督。为充分释放激励训练的潜力，我们进一步提出三种简洁高效的改进策略：仅对齐网络logits方向的KL散度损失化简版、子网络随机小尺度输入以及跨阶段采样规则。综合性实验与分析验证了激励训练及其三种改进策略的有效性。