Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either they disregard the optimizer and only provide uniform convergence upper bounds with no separating lower bounds, or they consider simplistic tasks that do not truly mirror the locality and translation invariance as found in real-world vision tasks. To address these deficiencies, we introduce the Dynamic Signal Distribution (DSD) classification task that models an image as consisting of $k$ patches, each of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. On this task, for any orthogonally equivariant algorithm like gradient descent, we prove that CNNs require $\tilde{O}(k+d)$ samples, whereas LCNs require $\Omega(kd)$ samples, establishing the statistical advantages of weight sharing in translation invariant tasks. Furthermore, LCNs need $\tilde{O}(k(k+d))$ samples, compared to $\Omega(k^2d)$ samples for FCNs, showcasing the benefits of locality in local tasks. Additionally, we develop information theoretic tools for analyzing randomized algorithms, which may be of interest for statistical research.
翻译:视觉任务具有局部性和平移不变性的特点。卷积神经网络(CNN)在这些任务上的优越表现被广泛归因于其架构中内建的局部性和权重共享的归纳偏置。现有研究试图量化这些偏置在CNN相对于局部连接卷积神经网络(LCN)和全连接神经网络(FCN)中的统计优势,但存在以下缺陷:要么忽略优化器,仅提供统一收敛的上界而无分离下界;要么考虑过于简单的任务,无法真实反映现实视觉任务中的局部性和平移不变性。为解决这些问题,我们引入了动态信号分布(DSD)分类任务,该任务将图像建模为由$k$个补丁组成,每个补丁维度为$d$,标签由可在任意一个补丁中自由出现的$d$稀疏信号向量决定。在此任务上,对于任何正交等变算法(如梯度下降),我们证明CNN需要$\tilde{O}(k+d)$个样本,而LCN需要$\Omega(kd)$个样本,确立了权重共享在平移不变任务中的统计优势。此外,LCN需要$\tilde{O}(k(k+d))$个样本,而FCN需要$\Omega(k^2d)$个样本,展示了局部性在局部任务中的益处。最后,我们发展了用于分析随机算法的信息论工具,这可能对统计研究具有参考价值。