Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.
翻译:动态计算已成为提升深度网络推理效率的一个有前景的途径。它允许选择性激活计算单元,从而减少每个输入样本不必要的计算量。然而,这些动态模型的实际效率可能与理论预测存在偏差。这种不匹配源于:1) 研究碎片化导致缺乏统一方法;2) 尤其在支持CUDA的GPU环境下,算法设计未关注关键调度策略;3) 鉴于大多数库仅支持静态操作,衡量实际延迟存在挑战。针对这些问题,我们提出了延迟感知统一动态网络(LAUDNet),一个整合了三种主要动态范式的框架——空间自适应计算、动态层跳转与动态通道跳转。为弥合理论与实践效率之间的差距,LAUDNet将算法设计与调度优化相结合,并借助一个精确评估动态算子延迟的延迟预测器进行引导。我们在多个视觉任务上测试了LAUDNet,证明其在V100、RTX3090和TX2等GPU平台上能使ResNet-101等模型的延迟降低超50%。值得注意的是,LAUDNet在精度与效率之间实现了卓越平衡。代码已开源:https://www.github.com/LeapLabTHU/LAUDNet。