Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is much faster than its counterpart, regular convolution, accuracy is sacrificed. In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy. We show our approach can be further generalized to the multi-channel and multi-layer cases, based on Generalized Singular Value Decomposition (GSVD) [59]. We conduct thorough experiments with the latest ShuffleNet V2 model [47] on both random synthesized dataset and a large-scale image recognition dataset: ImageNet [10]. Our approach outperforms channel decomposition [73] on all datasets. More importantly, our approach improves the Top-1 accuracy of ShuffleNet V2 by ~2%.
翻译:非常深的卷积神经网络(CNNs)已被牢固确立为众多计算机视觉任务的主要方法。然而,大多数最先进的CNNs规模较大,导致推理延迟较高。近年来,深度方向可分离卷积被提出用于计算受限平台(如机器人和自动驾驶汽车)上的图像识别任务。尽管其速度远超普通卷积,但精度有所牺牲。在本文中,我们提出了一种基于SVD的新型分解方法,即深度方向分解,用于将普通卷积扩展为深度方向可分离卷积,同时保持高精度。我们展示了该方法可进一步推广至多通道和多层情况,基于广义奇异值分解(GSVD)[59]。我们在随机合成数据集和大型图像识别数据集ImageNet [10]上,使用最新的ShuffleNet V2模型[47]进行了全面实验。我们的方法在所有数据集上均优于通道分解[73]。更重要的是,我们的方法将ShuffleNet V2的Top-1准确率提升了约2%。