Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.
翻译:动态计算已成为提升深度网络推理效率的重要途径,通过选择性激活计算单元可减少每个输入样本的不必要计算。然而,这些动态模型的实际效率往往与理论预测存在偏差,产生这种不匹配的原因包括:1)研究碎片化导致缺乏统一方法;2)算法设计优先于关键调度策略(尤其在支持CUDA的GPU环境中);3)当前多数计算库仅支持静态操作,难以准确测量实际延迟。针对这些问题,我们提出了延迟感知统一动态网络(LAUDNet),该框架整合了空间自适应计算、动态层跳转和动态通道跳转三种主要动态范式。为弥合理论与实际效率差距,LAUDNet将算法设计与调度优化相结合,并通过精确预测动态算子延迟的延迟预测器进行引导。我们在多个视觉任务上测试了LAUDNet,结果表明该框架能在V100、RTX3090和TX2等GPU平台上将ResNet-101等模型的延迟降低50%以上。特别值得注意的是,LAUDNet在精度与效率之间实现了卓越平衡。代码开源地址:https://www.github.com/LeapLabTHU/LAUDNet