Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.
翻译:剪枝是降低深度网络计算成本的标准技术。许多剪枝方面的进展都利用了彩票假说(LTH)的概念。LTH揭示,在训练好的稠密网络内部存在稀疏子网络(彩票),这些子网络能够达到相似的准确率(即中得彩票——中奖彩票)。初始化剪枝旨在无需训练稠密网络即可找到中奖彩票。对这些概念的研究普遍遵循子网络来源于权重或滤波器剪枝的趋势。本文从层剪枝的角度研究了LTH和初始化剪枝。首先,我们证实了当剪枝过程移除层时,中奖彩票的存在性。基于这一发现,我们提出在初始化阶段发现这些中奖彩票,从而消除了训练初始(过参数化)稠密网络所需的大量计算资源。大量实验表明,我们的中奖彩票显著加速了训练阶段,并减少了高达51%的碳排放,这是迈向民主化和绿色人工智能的重要一步。除了计算效益,我们的中奖彩票还对对抗样本和分布外样本展现出鲁棒性。最后,我们证明了与滤波器移除(标准结构化LTH)得到的彩票难以在初始化时中奖相比,我们的子网络在初始化时能够轻松中得彩票。