Many convolutional neural networks (CNNs) rely on progressive downsampling of their feature maps to increase the network's receptive field and decrease computational cost. However, this comes at the price of losing granularity in the feature maps, limiting the ability to correctly understand images or recover fine detail in dense prediction tasks. To address this, common practice is to replace the last few downsampling operations in a CNN with dilated convolutions, allowing to retain the feature map resolution without reducing the receptive field, albeit increasing the computational cost. This allows to trade off predictive performance against cost, depending on the output feature resolution. By either regularly downsampling or not downsampling the entire feature map, existing work implicitly treats all regions of the input image and subsequent feature maps as equally important, which generally does not hold. We propose an adaptive downsampling scheme that generalizes the above idea by allowing to process informative regions at a higher resolution than less informative ones. In a variety of experiments, we demonstrate the versatility of our adaptive downsampling strategy and empirically show that it improves the cost-accuracy trade-off of various established CNNs.
翻译:许多卷积神经网络(CNN)依赖特征图的渐进下采样来扩大网络感受野并降低计算成本。然而,这会带来特征图粒度丢失的代价,限制了正确理解图像或在密集预测任务中恢复细粒度细节的能力。为解决这一问题,常见做法是将CNN末尾若干下采样操作替换为空洞卷积,从而在保持特征图分辨率的同时不缩减感受野,尽管这会增加计算成本。这一方法可根据输出特征分辨率权衡预测性能与计算开销。现有工作通过对整个特征图进行均匀下采样或不采样,隐含地将输入图像及后续特征图的所有区域视为同等重要,但这一假设通常不成立。我们提出一种自适应下采样方案,通过允许对信息更丰富的区域以高于信息贫瘠区域的分辨率进行处理,推广了上述思想。通过多种实验,我们证明了自适应下采样策略的通用性,并实证表明其能改善多种既有CNN的精度-成本权衡。