In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only $\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth. We also delve into the distinct roles of weight sharing and locality in CNNs. To this end, we compare the performance of CNNs, locally-connected networks (LCNs), and fully-connected networks (FCNs) on a simple regression task, where LCNs can be viewed as CNNs without weight sharing. On the one hand, we prove that LCNs require ${\Omega}(d)$ samples while CNNs need only $\widetilde{\mathcal{O}}(\log^2d)$ samples, highlighting the critical role of weight sharing. On the other hand, we prove that FCNs require $\Omega(d^2)$ samples, whereas LCNs need only $\widetilde{\mathcal{O}}(d)$ samples, underscoring the importance of locality. These provable separations quantify the difference between the two biases, and the major observation behind our proof is that weight sharing and locality break different symmetries in the learning process.
翻译:本文对卷积神经网络(CNN)中的归纳偏置进行了理论分析。首先考察CNN的普适性,即逼近任意连续函数的能力。我们证明深度为$\mathcal{O}(\log d)$的深层CNN足以实现该普适性,其中$d$为输入维度。此外,我们建立结论:使用CNN学习稀疏函数仅需$\widetilde{\mathcal{O}}(\log^2d)$个样本,表明深层CNN能有效捕获{\em长程}稀疏相关性。这些结果通过结合网络深度增加时的多通道化与下采样这一创新方法得以实现。我们还深入探讨了权值共享和局部性在CNN中的不同作用。为此,我们在简单回归任务上比较了CNN、局部连接网络(LCN)和全连接网络(FCN)的性能,其中LCN可视为无权值共享的CNN。一方面,我们证明LCN需要${\Omega}(d)$个样本而CNN仅需$\widetilde{\mathcal{O}}(\log^2d)$个样本,凸显权值共享的关键作用。另一方面,我们证明FCN需要$\Omega(d^2)$个样本而LCN仅需$\widetilde{\mathcal{O}}(d)$个样本,强调局部性的重要性。这些可证明的分离量化了两种偏置的差异,证明背后的主要观察是权值共享和局部性打破了学习过程中不同的对称性。