Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem, we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network, and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal. Network Architecture Search (NAS) might be used to optimize downsampling configurations as an hyperparameter. However, we find that common one-shot NAS based on a single SuperNet does not work for this problem. We argue that this is because a SuperNet trained for finding the optimal pooling configuration fully shares its parameters among all pooling configurations. This makes its training hard, because learning some configurations can harm the performance of others. Therefore, we propose a balanced mixture of SuperNets that automatically associates pooling configurations to different weight models and helps to reduce the weight-sharing and inter-influence of pooling configurations on the SuperNet parameters. We evaluate our proposed approach on CIFAR10, CIFAR100, as well as Food101 and show that in all cases, our model outperforms other approaches and improves over the default pooling configurations.
翻译:下采样层(包括池化和步长卷积)是卷积神经网络架构的关键组成部分,它们同时决定了图像特征分析的粒度/尺度以及给定层的感受野大小。为深入理解这一问题,我们使用ResNet20网络在CIFAR10数据集上独立训练了采用不同池化配置的模型,并分析了其性能表现。结果表明,下采样层的位置会显著影响网络性能,且预设的下采样配置并非最优方案。网络架构搜索(NAS)可作为超参数优化下采样配置的手段,但研究发现基于单一超网络的通用一次性NAS方法在此问题上并不奏效。我们认为其原因在于:为寻找最优池化配置而训练的超网络,其所有参数在所有池化配置间完全共享,使得某些配置的学习可能损害其他配置的性能,导致训练困难。为此,我们提出一种平衡超网络混合方法,该方法能自动将池化配置与不同的权重模型相关联,有效减少池化配置在超网络参数上的权重共享与相互影响。我们在CIFAR10、CIFAR100以及Food101数据集上的实验表明,所提方法在所有场景下均优于其他方法,并相较于默认池化配置取得性能提升。