While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.
翻译:尽管动态卷积(DY-Conv)通过结合注意力机制的多组并行权重实现自适应权重选择,展现出良好的性能,但这些权重的频率响应往往表现出高度相似性,导致参数量大但适应性有限。本文提出频率动态卷积(FDConv),通过在傅里叶域学习固定参数预算来缓解这些限制。FDConv将该预算划分为具有不相交傅里叶索引的频域分组,从而在不增加参数成本的情况下构建频率多样化的权重。为进一步增强适应性,我们提出核空间调制(KSM)与频带调制(FBM)。KSM在空间层面动态调整每个滤波器的频率响应,而FBM在频域将权重分解为不同频带,并根据局部内容进行动态调制。在目标检测、分割和分类任务上的大量实验验证了FDConv的有效性。实验表明,在ResNet-50上应用FDConv仅增加360万参数即可实现更优性能,显著优于需要大幅增加参数量的先前方法(如CondConv +9000万参数,KW +7650万参数)。此外,FDConv可无缝集成到多种架构中,包括ConvNeXt、Swin-Transformer,为现代视觉任务提供了灵活高效的解决方案。代码已公开于https://github.com/Linwei-Chen/FDConv。