Adaptive Depth Networks with Skippable Sub-Paths

Predictable adaptation of network depths can be an effective way to control inference latency and meet the resource condition of various devices. However, previous adaptive depth networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present a practical approach to adaptive depth networks that is applicable to various networks with minimal training effort. In our approach, every hierarchical residual stage is divided into two sub-paths, and they are trained to acquire different properties through a simple self-distillation strategy. While the first sub-path is essential for hierarchical feature learning, the second one is trained to refine the learned features and minimize performance degradation if it is skipped. Unlike prior adaptive networks, our approach does not train every target sub-network in an iterative manner. At test time, however, we can connect these sub-paths in a combinatorial manner to select sub-networks of various accuracy-efficiency trade-offs from a single network. We provide a formal rationale for why the proposed training method can reduce overall prediction errors while minimizing the impact of skipping sub-paths. We demonstrate the generality and effectiveness of our approach with convolutional neural networks and transformers.

翻译：预测性地调整网络深度是控制推理延迟并满足各种设备资源条件的有效方法。然而，先前的自适应深度网络并未提供通用原则和形式化解释来说明为何以及哪些层可被跳过，因此其方法难以泛化，且需要冗长复杂的训练步骤。本文提出了一种实用的自适应深度网络方法，只需极少的训练工作即可适用于多种网络。在我们的方法中，每个层次残差阶段被划分为两个子路径，并通过简单的自蒸馏策略使其学习获得不同的特性。第一子路径对层次特征学习至关重要，而第二子路径则用于精炼已学特征，并在被跳过时最小化性能损失。与先前的自适应网络不同，我们的方法不以迭代方式训练每个目标子网络。然而，在测试阶段，我们可以将这些子路径以组合方式连接，从单个网络中选择具有不同精度-效率权衡的子网络。我们从形式上论证了所提训练方法为何能在最小化跳过子路径影响的同时，降低整体预测误差。我们通过卷积神经网络和Transformer展示了方法的通用性和有效性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日