In convolutional neural networks, the convolutions are conventionally performed using a square kernel with a fixed N $\times$ N receptive field (RF). However, what matters most to the network is the effective receptive field (ERF) that indicates the extent with which input pixels contribute to an output pixel. Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work. Specifically, GMConv utilizes the Gaussian function to generate a concentric symmetry mask that is placed over the kernel to refine the RF. Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation. We evaluate our approach through extensive experiments on image classification and object detection tasks. Over several tasks and standard base models, our approach compares favorably against the standard convolution. For instance, using GMConv for AlexNet and ResNet-50, the top-1 accuracy on ImageNet classification is boosted by 0.98% and 0.85%, respectively.
翻译:在卷积神经网络中,卷积操作通常使用固定N×N感受野的方形核执行。然而,对网络而言最重要的是有效感受野(ERF),它表示输入像素对输出像素的贡献程度。受有效感受野通常呈现高斯分布这一特性的启发,本文提出了一种高斯掩码卷积核(GMConv)。具体而言,GMConv利用高斯函数生成同轴对称掩码,并将其覆盖在卷积核上以优化感受野。我们的GMConv可直接替代现有CNN中的标准卷积,并通过标准反向传播轻松实现端到端训练。我们通过图像分类和目标检测任务上的大量实验评估所提方法。在多项任务和标准基模型上,我们的方法相较于标准卷积具有显著优势。例如,将GMConv应用于AlexNet和ResNet-50时,在ImageNet分类任务上的Top-1准确率分别提升了0.98%和0.85%。