Dilated convolution, which expands the receptive field by inserting gaps between its consecutive elements, is widely employed in computer vision. In this study, we propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis. Departing from the conventional practice of fixing a global dilation rate as a hyperparameter, we introduce Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts dilation rates spatially based on local frequency components. Subsequently, we design two plug-in modules to directly enhance effective bandwidth and receptive field size. The Adaptive Kernel (AdaKern) module decomposes convolution weights into low-frequency and high-frequency components, dynamically adjusting the ratio between these components on a per-channel basis. By increasing the high-frequency part of convolution weights, AdaKern captures more high-frequency components, thereby improving effective bandwidth. The Frequency Selection (FreqSelect) module optimally balances high- and low-frequency components in feature representations through spatially variant reweighting. It suppresses high frequencies in the background to encourage FADC to learn a larger dilation, thereby increasing the receptive field for an expanded scope. Extensive experiments on segmentation and object detection consistently validate the efficacy of our approach. The code is publicly available at https://github.com/Linwei-Chen/FADC.
翻译:空洞卷积通过在连续元素之间插入空隙来扩大感受野,在计算机视觉领域被广泛应用。本研究从频谱分析的角度出发,提出三种策略来改进空洞卷积的各个阶段。与传统将固定全局膨胀率作为超参数的做法不同,我们引入了频率自适应空洞卷积(FADC),它能基于局部频率分量在空间上动态调整膨胀率。随后,我们设计了两个即插即用模块,以直接提升有效带宽和感受野大小。自适应核(AdaKern)模块将卷积权重分解为低频和高频分量,并在逐通道基础上动态调整这两类分量的比例。通过增加卷积权重中的高频部分,AdaKern能够捕获更多高频分量,从而提升有效带宽。频率选择(FreqSelect)模块通过空间可变的重加权机制,在特征表示中最优地平衡高低频分量:它抑制背景中的高频成分,促使FADC学习更大的膨胀率,从而扩大感受野以覆盖更广范围。在语义分割和目标检测任务上的大量实验一致验证了我们方法的有效性。代码已开源至https://github.com/Linwei-Chen/FADC。