In convolutional neural networks, the convolutions are conventionally performed using a square kernel with a fixed N $\times$ N receptive field (RF). However, what matters most to the network is the effective receptive field (ERF) that indicates the extent with which input pixels contribute to an output pixel. Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work. Specifically, GMConv utilizes the Gaussian function to generate a concentric symmetry mask that is placed over the kernel to refine the RF. Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation. We evaluate our approach through extensive experiments on image classification and object detection tasks. Over several tasks and standard base models, our approach compares favorably against the standard convolution. For instance, using GMConv for AlexNet and ResNet-50, the top-1 accuracy on ImageNet classification is boosted by 0.98% and 0.85%, respectively.
翻译:在卷积神经网络中,传统卷积采用固定N×N感受野的方形核进行运算。然而,对网络而言更重要的是有效感受野(ERF),它表征输入像素对输出像素的贡献程度。受有效感受野通常呈现高斯分布特性的启发,本文提出高斯掩膜卷积核(GMConv)。具体而言,GMConv利用高斯函数生成同心对称掩膜,将其叠加于卷积核之上以优化感受野。该模块可直接替代现有CNN中的标准卷积,并可通过标准反向传播轻松实现端到端训练。我们在图像分类与目标检测任务上进行了广泛实验评估。在多个任务和标准基模型上,本方法均优于标准卷积。例如,将GMConv应用于AlexNet和ResNet-50时,在ImageNet分类任务中top-1准确率分别提升0.98%和0.85%。