Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient $1\times 1$ convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning $3\times 3$ convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
翻译:遵循卷积神经网络(CNNs)的传统范式,现代CNNs不仅通过增加网络深度和宽度,还通过扩大卷积核尺寸,以保持与近期基于Transformer等模型的竞争力。这导致训练过程中需要处理大量可学习模型参数。尽管沿用了具有空间归纳偏置的卷积范式,我们质疑了学习卷积滤波器的重要意义。事实上,我们的发现表明,许多当代CNN架构可以在不更新随机初始化(空间)卷积滤波器的情况下达到高测试精度。相反,简单的线性组合(通过高效的1×1卷积实现)足以将甚至随机滤波器重新组合为具有表达能力的网络算子。此外,这些随机滤波器的组合能够隐式正则化所得运算,缓解过拟合并提升整体性能与鲁棒性。相反,保留学习滤波器更新的能力反而可能损害网络性能。最后,尽管我们观察到从学习3×3卷积中获得的收益相对较小,但学习收益随卷积核尺寸成比例增长,这归因于默认初始化技术中独立同分布(i.i.d.)假设的非完美性。