Dilated convolution, which expands the receptive field by inserting gaps between its consecutive elements, is widely employed in computer vision. In this study, we propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis. Departing from the conventional practice of fixing a global dilation rate as a hyperparameter, we introduce Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts dilation rates spatially based on local frequency components. Subsequently, we design two plug-in modules to directly enhance effective bandwidth and receptive field size. The Adaptive Kernel (AdaKern) module decomposes convolution weights into low-frequency and high-frequency components, dynamically adjusting the ratio between these components on a per-channel basis. By increasing the high-frequency part of convolution weights, AdaKern captures more high-frequency components, thereby improving effective bandwidth. The Frequency Selection (FreqSelect) module optimally balances high- and low-frequency components in feature representations through spatially variant reweighting. It suppresses high frequencies in the background to encourage FADC to learn a larger dilation, thereby increasing the receptive field for an expanded scope. Extensive experiments on segmentation and object detection consistently validate the efficacy of our approach. The code is publicly available at \url{https://github.com/Linwei-Chen/FADC}.
翻译:空洞卷积通过在连续元素间插入间隙来扩大感受野,广泛应用于计算机视觉领域。本研究从频谱分析视角提出三种策略来改进空洞卷积的各阶段。与将全局膨胀率固定为超参数的传统做法不同,我们引入频率自适应空洞卷积(Frequency-Adaptive Dilated Convolution, FADC),该方法根据局部频率分量动态调整各空间位置的膨胀率。随后,我们设计了两个即插即用模块来直接增强有效带宽和感受野尺寸:自适应核(Adaptive Kernel, AdaKern)模块将卷积权重分解为低频和高频分量,并在每个通道上动态调整两者比例——通过增加卷积权重中的高频成分,AdaKern能捕获更多高频分量从而提升有效带宽;频率选择(Frequency Selection, FreqSelect)模块则通过空间变权重重新平衡特征表示中的高频与低频分量——抑制背景区域的高频分量,促使FADC学习更大的膨胀率,从而扩展感受野范围。在语义分割与目标检测任务上的大量实验一致验证了本方法的有效性。代码已开源在 \url{https://github.com/Linwei-Chen/FADC}。