Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural networks (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work introduces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: 1D-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral "Fourier" Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral "Fourier" Dropout (1D-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both 1D and 2D SWD variants have competitive performance on CIFAR-10/100 benchmarks relative to both 1D-SFD and 2D-SFD. Specifically, 1D-SWD has a significantly lower computational complexity compared to 1D/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass 1D-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.
翻译:正则化技术有助于防止过拟合,从而提升卷积神经网络(CNN)的泛化能力。过拟合的一个原因是网络不同部分之间存在复杂的协同适应,这使得CNN依赖于它们的联合响应,而不是鼓励每个部分独立学习有用的特征表示。频域操作是一种通过利用频率分解来修改具有时间和空间相干性数据的强大策略。本文提出了谱小波丢弃(SWD),一种新颖的正则化方法,包含两种变体:一维SWD和二维SWD。这些变体通过在特征图的离散小波分解中随机丢弃细节频带来提升CNN的泛化能力。我们的方法与现有的谱“傅里叶”丢弃(2D-SFD)不同,后者是在傅里叶域中消除系数。值得注意的是,SWD仅需一个超参数,而SFD需要两个。我们还通过实现一维版本的谱“傅里叶”丢弃(1D-SFD)扩展了文献,为全面比较奠定了基础。我们的评估表明,在CIFAR-10/100基准测试中,一维和二维SWD变体相对于一维SFD和二维SFD均表现出具有竞争力的性能。具体而言,一维SWD的计算复杂度显著低于一维/二维SFD。在Pascal VOC目标检测基准中,SWD变体在性能上超越了一维SFD和二维SFD,并且在训练过程中表现出更低的计算复杂度。