The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a subnetwork within a sufficiently overparameterized (dense) neural network that -- when initialized randomly and without any training -- achieves the accuracy of a fully trained target network. Recent works by Da Cunha et. al 2022; Burkholz 2022 demonstrate that the SLTH can be extended to translation equivariant networks -- i.e. CNNs -- with the same level of overparametrization as needed for the SLTs in dense networks. However, modern neural networks are capable of incorporating more than just translation symmetry, and developing general equivariant architectures such as rotation and permutation has been a powerful design principle. In this paper, we generalize the SLTH to functions that preserve the action of the group $G$ -- i.e. $G$-equivariant network -- and prove, with high probability, that one can approximate any $G$-equivariant network of fixed width and depth by pruning a randomly initialized overparametrized $G$-equivariant network to a $G$-equivariant subnetwork. We further prove that our prescribed overparametrization scheme is optimal and provides a lower bound on the number of effective parameters as a function of the error tolerance. We develop our theory for a large range of groups, including subgroups of the Euclidean $\text{E}(2)$ and Symmetric group $G \leq \mathcal{S}_n$ -- allowing us to find SLTs for MLPs, CNNs, $\text{E}(2)$-steerable CNNs, and permutation equivariant networks as specific instantiations of our unified framework. Empirically, we verify our theory by pruning overparametrized $\text{E}(2)$-steerable CNNs, $k$-order GNNs, and message passing GNNs to match the performance of trained target networks.
翻译:强彩票假设(SLTH)指出,在充分过参数化(稠密)的神经网络中,存在一个子网络——该子网络在随机初始化且未经任何训练的情况下——能够达到完全训练的目标网络的精度。近期Da Cunha等人(2022)及Burkholz(2022)的研究表明,SLTH可推广至平移等变网络(即卷积神经网络CNN),且所需的过参数化水平与稠密网络中的强彩票(SLT)相当。然而,现代神经网络不仅能整合平移对称性,还可实现旋转、置换等更通用的等变架构,这已成为强大的设计原则。本文我们将SLTH推广到保持群$G$作用的函数(即$G$-等变网络),并证明:通过剪枝一个随机初始化的过参数化$G$-等变网络为$G$-等变子网络,能以高概率近似任意固定宽度和深度的$G$-等变网络。我们进一步证明所提出的过参数化方案是最优的,并给出了有效参数数量关于容错率的下降界。该理论适用于广泛的群类,包括欧几里得群$\text{E}(2)$的子群及对称群$G \leq \mathcal{S}_n$——这使得多层感知机(MLP)、卷积神经网络(CNN)、$\text{E}(2)$-可导CNN以及置换等变网络均可作为我们统一框架的具体实例。实验上,我们通过剪枝过参数化的$\text{E}(2)$-可导CNN、$k$阶图神经网络和消息传递图神经网络来匹配训练目标网络的性能,从而验证了理论。