Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.

翻译：动态计算已成为提升深度网络推理效率的重要途径，通过选择性激活计算单元可减少每个输入样本的不必要计算。然而，这些动态模型的实际效率往往与理论预测存在偏差，产生这种不匹配的原因包括：1）研究碎片化导致缺乏统一方法；2）算法设计优先于关键调度策略（尤其在支持CUDA的GPU环境中）；3）当前多数计算库仅支持静态操作，难以准确测量实际延迟。针对这些问题，我们提出了延迟感知统一动态网络（LAUDNet），该框架整合了空间自适应计算、动态层跳转和动态通道跳转三种主要动态范式。为弥合理论与实际效率差距，LAUDNet将算法设计与调度优化相结合，并通过精确预测动态算子延迟的延迟预测器进行引导。我们在多个视觉任务上测试了LAUDNet，结果表明该框架能在V100、RTX3090和TX2等GPU平台上将ResNet-101等模型的延迟降低50%以上。特别值得注意的是，LAUDNet在精度与效率之间实现了卓越平衡。代码开源地址：https://www.github.com/LeapLabTHU/LAUDNet

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日