Over the past decade, deep learning models have exhibited considerable advancements, reaching or even exceeding human-level performance in a range of visual perception tasks. This remarkable progress has sparked interest in applying deep networks to real-world applications, such as autonomous vehicles, mobile devices, robotics, and edge computing. However, the challenge remains that state-of-the-art models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. This trade-off between effectiveness and efficiency has catalyzed the emergence of a new research focus: computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference. This review offers an extensive analysis of this rapidly evolving field by examining four key areas: 1) the development of static or dynamic light-weighted backbone models for the efficient extraction of discriminative deep representations; 2) the specialized network architectures or algorithms tailored for specific computer vision tasks; 3) the techniques employed for compressing deep learning models; and 4) the strategies for deploying efficient deep networks on hardware platforms. Additionally, we provide a systematic discussion on the critical challenges faced in this domain, such as network architecture design, training schemes, practical efficiency, and more realistic model compression approaches, as well as potential future research directions.
翻译:过去十年间,深度学习模型取得了长足进步,在多项视觉感知任务中达到甚至超越人类水平的表现。这一显著进展引发了将深度网络应用于实际场景(如自动驾驶、移动设备、机器人技术和边缘计算)的研究热潮。然而,当前最先进的模型通常需要大量计算资源,导致实际场景中产生难以承受的功耗、延迟或碳排放问题。这种效能与效率之间的权衡催生了新的研究方向——算效深度学习,旨在最小化推理计算成本的同时实现令人满意的性能。本综述系统分析了这一快速发展领域的四大核心方向:1)面向高效提取判别性深度表征的静态或动态轻量级骨干网络开发;2)针对特定计算机视觉任务定制的专用网络架构或算法;3)深度学习模型压缩技术;4)高效深度网络硬件平台部署策略。此外,本文系统论述了该领域面临的关键挑战,包括网络架构设计、训练方案、实际效率、更真实的模型压缩方法,以及潜在的未来研究方向。