Predictable adaptation of network depths can be an effective way to control inference latency and meet the resource condition of various devices. However, previous adaptive depth networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present a practical approach to adaptive depth networks that is applicable to various networks with minimal training effort. In our approach, every hierarchical residual stage is divided into two sub-paths, and they are trained to acquire different properties through a simple self-distillation strategy. While the first sub-path is essential for hierarchical feature learning, the second one is trained to refine the learned features and minimize performance degradation if it is skipped. Unlike prior adaptive networks, our approach does not train every target sub-network in an iterative manner. At test time, however, we can connect these sub-paths in a combinatorial manner to select sub-networks of various accuracy-efficiency trade-offs from a single network. We provide a formal rationale for why the proposed training method can reduce overall prediction errors while minimizing the impact of skipping sub-paths. We demonstrate the generality and effectiveness of our approach with convolutional neural networks and transformers.
翻译:预测性地调整网络深度是控制推理延迟并满足各种设备资源条件的有效方法。然而,先前的自适应深度网络并未提供通用原则和形式化解释来说明为何以及哪些层可被跳过,因此其方法难以泛化,且需要冗长复杂的训练步骤。本文提出了一种实用的自适应深度网络方法,只需极少的训练工作即可适用于多种网络。在我们的方法中,每个层次残差阶段被划分为两个子路径,并通过简单的自蒸馏策略使其学习获得不同的特性。第一子路径对层次特征学习至关重要,而第二子路径则用于精炼已学特征,并在被跳过时最小化性能损失。与先前的自适应网络不同,我们的方法不以迭代方式训练每个目标子网络。然而,在测试阶段,我们可以将这些子路径以组合方式连接,从单个网络中选择具有不同精度-效率权衡的子网络。我们从形式上论证了所提训练方法为何能在最小化跳过子路径影响的同时,降低整体预测误差。我们通过卷积神经网络和Transformer展示了方法的通用性和有效性。