Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.

翻译：动态计算已成为提升深度网络推理效率的一个有前景的途径。它允许选择性激活计算单元，从而减少每个输入样本不必要的计算量。然而，这些动态模型的实际效率可能与理论预测存在偏差。这种不匹配源于：1) 研究碎片化导致缺乏统一方法；2) 尤其在支持CUDA的GPU环境下，算法设计未关注关键调度策略；3) 鉴于大多数库仅支持静态操作，衡量实际延迟存在挑战。针对这些问题，我们提出了延迟感知统一动态网络（LAUDNet），一个整合了三种主要动态范式的框架——空间自适应计算、动态层跳转与动态通道跳转。为弥合理论与实践效率之间的差距，LAUDNet将算法设计与调度优化相结合，并借助一个精确评估动态算子延迟的延迟预测器进行引导。我们在多个视觉任务上测试了LAUDNet，证明其在V100、RTX3090和TX2等GPU平台上能使ResNet-101等模型的延迟降低超50%。值得注意的是，LAUDNet在精度与效率之间实现了卓越平衡。代码已开源：https://www.github.com/LeapLabTHU/LAUDNet。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日