Early exits are an important efficiency mechanism integrated into deep neural networks that allows for the termination of the network's forward pass before processing through all its layers. By allowing early halting of the inference process for less complex inputs that reached high confidence, early exits significantly reduce the amount of computation required. Early exit methods add trainable internal classifiers which leads to more intricacy in the training process. However, there is no consistent verification of the approaches of training of early exit methods, and no unified scheme of training such models. Most early exit methods employ a training strategy that either simultaneously trains the backbone network and the exit heads or trains the exit heads separately. We propose a training approach where the backbone is initially trained on its own, followed by a phase where both the backbone and the exit heads are trained together. Thus, we advocate for organizing early-exit training strategies into three distinct categories, and then validate them for their performance and efficiency. In this benchmark, we perform both theoretical and empirical analysis of early-exit training regimes. We study the methods in terms of information flow, loss landscape and numerical rank of activations and gauge the suitability of regimes for various architectures and datasets.
翻译:早期退出是深度神经网络中一种重要的效率机制,允许网络在处理完所有层之前提前终止前向传播。通过允许对达到高置信度的较简单输入提前停止推理过程,早期退出显著减少了所需的计算量。早期退出方法添加了可训练的内部分类器,这使得训练过程更为复杂。然而,目前对早期退出方法的训练方式缺乏一致的验证,也没有统一的此类模型训练方案。大多数早期退出方法采用的训练策略要么同时训练主干网络和退出头,要么单独训练退出头。我们提出一种训练方法:首先单独训练主干网络,随后进入主干网络与退出头共同训练的阶段。因此,我们主张将早期退出训练策略划分为三个不同类别,并对其性能和效率进行验证。在此基准测试中,我们对早期退出训练机制进行了理论和实证分析。我们从信息流、损失景观和激活数值秩等角度研究这些方法,并评估不同机制对各种架构和数据集的适用性。