As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier Domain

Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.

翻译：受近期视觉应用中为提升上下文感知能力而采用更大感受野的趋势启发，我们旨在探究这类感受野究竟需要多大尺寸。为实现这一研究，需解决若干关键挑战，主要包括：(i) 提供一种有效方法，使模型能够在不增加训练或推理阶段内存消耗的前提下学习大尺寸滤波器（或可达到输入数据尺寸）；(ii) 将滤波器尺寸研究与其他因素（如网络宽度或可学习参数数量）解耦；(iii) 采用的卷积运算应作为即插即用模块，可替代卷积神经网络（CNN）中的任意常规卷积，并在现有框架中实现高效部署。为构建此类模型，我们提出将滤波器权重的频域表示而非空间域表示学习为神经隐函数，从而仅用少量可学习参数即可参数化任意无限大滤波器。所提出的神经隐式频域CNN是首个能在大型图像分类基准上达到与最先进方法相当性能的模型，且其完全在频域中执行卷积运算，可应用于任意CNN架构。该模型使我们能够对学习到的感受野进行深入分析。有趣的是，分析表明：尽管所提网络可学习极大的卷积核，但实际学得的滤波器在空间域中会转化为局部化良好且尺寸相对较小的卷积核。