In computer vision, the basic pipeline of most convolutional neural networks consists of multiple feature extraction layers, where the input signal is downsampled to a lower resolution in each subsequent layer. This downsampling process is commonly referred to as pooling, which is an essential operation in CNNs. Pooling improves robustness against transformations, reduces the number of trainable parameters, increases the receptive field, and lowers computation time. Since pooling is a lossy process but remains important for extracting high-level information from low-level representations, it is important to preserve the most prominent information from previous activations to improve network discriminability. Standard pooling is usually performed using dense pooling methods, such as max pooling or average pooling, or through strided convolutional kernels. In this paper, we propose a simple yet effective adaptive pooling method, called FlexPooling, which generalizes average pooling by learning a weighted average over activations jointly with the rest of the network. We further show that attaching Simple Auxiliary Classifiers (SAC) to the CNN improves performance and demonstrates the effectiveness of the proposed method compared with standard pooling methods. Experiments on multiple popular image classification datasets show that FlexPooling consistently outperforms baseline networks, achieving approximately 1 to 3 percent improvement in accuracy.
翻译:在计算机视觉领域,多数卷积神经网络的基本流程包含多个特征提取层,每个后续层都将输入信号降采样至更低分辨率。这种降采样过程通常称为池化,是CNN中的关键操作。池化增强了对变换的鲁棒性,减少了可训练参数数量,扩大了感受野,并降低了计算时间。由于池化虽是有损过程,但对从低级表示中提取高级信息仍至关重要,因此保留前一层激活中最显著的信息以提升网络判别能力具有重要意义。标准池化通常使用密集池化方法(如最大池化或平均池化)或通过步长卷积核实现。本文提出一种简单而有效的自适应池化方法——FlexPooling,该方法通过与网络其余部分联合学习激活值的加权平均,对平均池化进行泛化。我们进一步证明,在CNN中附加简单辅助分类器(SAC)可提升性能,并验证了所提方法相较于标准池化方法的有效性。在多个主流图像分类数据集上的实验表明,FlexPooling一致优于基线网络,准确率提升约1%至3%。