Resolution in deep convolutional neural networks (CNNs) is typically bounded by the receptive field size through filter sizes, and subsampling layers or strided convolutions on feature maps. The optimal resolution may vary significantly depending on the dataset. Modern CNNs hard-code their resolution hyper-parameters in the network architecture which makes tuning such hyper-parameters cumbersome. We propose to do away with hard-coded resolution hyper-parameters and aim to learn the appropriate resolution from data. We use scale-space theory to obtain a self-similar parametrization of filters and make use of the N-Jet: a truncated Taylor series to approximate a filter by a learned combination of Gaussian derivative filters. The parameter sigma of the Gaussian basis controls both the amount of detail the filter encodes and the spatial extent of the filter. Since sigma is a continuous parameter, we can optimize it with respect to the loss. The proposed N-Jet layer achieves comparable performance when used in state-of-the art architectures, while learning the correct resolution in each layer automatically. We evaluate our N-Jet layer on both classification and segmentation, and we show that learning sigma is especially beneficial for inputs at multiple sizes.
翻译:深度卷积神经网络(CNN)中的分辨率通常受限于滤波器尺寸决定的感受野大小,以及特征图上的子采样层或步长卷积。最优分辨率可能因数据集不同而存在显著差异。现代CNN将分辨率超参数硬编码在网络架构中,导致此类超参数的调优极为繁琐。我们提出摒弃硬编码的分辨率超参数,旨在从数据中学习合适的分辨率。利用尺度空间理论获取滤波器的自相似参数化,并采用N-Jet:一种截断泰勒级数,通过高斯导数滤波器的学习组合来逼近滤波器。高斯基函数的参数sigma同时控制滤波器编码的细节量和空间范围。由于sigma是连续参数,我们可基于损失函数对其进行优化。将所提出的N-Jet层应用于当前最优架构时,在自动学习各层正确分辨率的同时,可获得与其相媲美的性能。我们在分类和分割任务上评估了N-Jet层,实验表明学习sigma参数对于多尺度输入尤其有益。