Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
翻译:现代CNN需要学习大量卷积算子的权重。本文提出一个根本性问题:这种做法是否必要?我们证明,即使在只随机初始化且从不更新空间滤波器的极端情况下,某些CNN架构仍能达到超越标准训练方法的准确率。通过将逐点(1×1)卷积重新解释为学习冻结(随机)空间滤波器的线性组合(LC)的算子,我们能够分析这些效应,并提出一种通用的LC卷积模块,支持对线性组合率进行调整。实验表明,该方法不仅能在CIFAR和ImageNet上取得高测试准确率,而且在模型鲁棒性、泛化能力、稀疏性以及必要权重总数方面均具有优良特性。此外,我们提出一种新型权重共享机制,允许所有空间卷积层共享单一权重张量,从而大幅减少权重数量。