Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match the latter's accuracies. However, finding winning tickets requires burdensome computations in the train-prune-retrain process, especially on large-scale datasets (e.g., ImageNet), restricting their practical benefits. This paper explores a new perspective on finding lottery tickets more efficiently, by doing so only with a specially selected subset of data, called Pruning-Aware Critical set (PrAC set), rather than using the full training set. The concept of PrAC set was inspired by the recent observation, that deep networks have samples that are either hard to memorize during training, or easy to forget during pruning. A PrAC set is thus hypothesized to capture those most challenging and informative examples for the dense model. We observe that a high-quality winning ticket can be found with training and pruning the dense network on the very compact PrAC set, which can substantially save training iterations for the ticket finding process. Extensive experiments validate our proposal across diverse datasets and network architectures. Specifically, on CIFAR-10, CIFAR-100, and Tiny ImageNet, we locate effective PrAC sets at 35.32%~78.19% of their training set sizes. On top of them, we can obtain the same competitive winning tickets for the corresponding dense networks, yet saving up to 82.85%~92.77%, 63.54%~74.92%, and 76.14%~86.56% training iterations, respectively. Crucially, we show that a PrAC set found is reusable across different network architectures, which can amortize the extra cost of finding PrAC sets, yielding a practical regime for efficient lottery ticket finding.

翻译：彩票假设(LTH)揭示了为密集网络提供的中奖票(Smarse但关键次网络)的存在,这些票可以被孤立地从随机初始化中进行单独培训,以匹配后者的美化。然而,找到中奖票需要在培训中进行繁琐的计算,特别是在大规模数据集(如图像网)上,限制了它们的实际效益。本文探索了更高效地找到彩票的新视角,只有专门选定的一组数据,即Pruning-Aware Creen 成套数据(PrAC 成套),而不是使用全套训练。76 PrAC成套概念受到最近观察的启发,深级网络的样本要么难以在培训中进行记忆化,要么在运行过程中容易忘记。因此,PrAC成套的样本在捕捉到最具有挑战性和信息性的例子(如图象网)时,可以捕捉到高品质的彩票票,在非常紧凑的PrAC成套数据组(PrAC成套数据组)上,可以大大节省它的培训。