We introduce Mixture-of-Depths (MoD) for Convolutional Neural Networks (CNNs), a novel approach that enhances the computational efficiency of CNNs by selectively processing channels based on their relevance to the current prediction. This method optimizes computational resources by dynamically selecting key channels in feature maps for focused processing within the convolutional blocks (Conv-Blocks), while skipping less relevant channels. Unlike conditional computation methods that require dynamic computation graphs, CNN MoD uses a static computation graph with fixed tensor sizes which improve hardware efficiency. It speeds up the training and inference processes without the need for customized CUDA kernels, unique loss functions, or finetuning. CNN MoD either matches the performance of traditional CNNs with reduced inference times, GMACs, and parameters, or exceeds their performance while maintaining similar inference times, GMACs, and parameters. For example, on ImageNet, ResNet86-MoD exceeds the performance of the standard ResNet50 by 0.45% with a 6% speedup on CPU and 5% on GPU. Moreover, ResNet75-MoD achieves the same performance as ResNet50 with a 25% speedup on CPU and 15% on GPU.
翻译:我们为卷积神经网络(CNN)引入了混合深度方法(Mixture-of-Depths, MoD),这是一种新颖的方法,通过基于通道与当前预测的相关性进行选择性处理,从而提升CNN的计算效率。该方法通过在卷积块(Conv-Blocks)内动态选择特征图中的关键通道进行聚焦处理,同时跳过相关性较低的通道,从而优化计算资源。与需要动态计算图的条件计算方法不同,CNN MoD使用具有固定张量大小的静态计算图,这提高了硬件效率。它加速了训练和推理过程,且无需定制的CUDA内核、特殊的损失函数或微调。CNN MoD要么在减少推理时间、GMACs和参数的情况下与传统CNN性能相当,要么在保持相似推理时间、GMACs和参数的同时超越其性能。例如,在ImageNet上,ResNet86-MoD以CPU上6%、GPU上5%的加速,性能超过标准ResNet50达0.45%。此外,ResNet75-MoD在CPU上实现25%、GPU上15%加速的同时,达到了与ResNet50相同的性能。