We develop an approach to growing deep network architectures over the course of training, driven by a principled combination of accuracy and sparsity objectives. Unlike existing pruning or architecture search techniques that operate on full-sized models or supernet architectures, our method can start from a small, simple seed architecture and dynamically grow and prune both layers and filters. By combining a continuous relaxation of discrete network structure optimization with a scheme for sampling sparse subnetworks, we produce compact, pruned networks, while also drastically reducing the computational expense of training. For example, we achieve $49.7\%$ inference FLOPs and $47.4\%$ training FLOPs savings compared to a baseline ResNet-50 on ImageNet, while maintaining $75.2\%$ top-1 accuracy -- all without any dedicated fine-tuning stage. Experiments across CIFAR, ImageNet, PASCAL VOC, and Penn Treebank, with convolutional networks for image classification and semantic segmentation, and recurrent networks for language modeling, demonstrate that we both train faster and produce more efficient networks than competing architecture pruning or search methods.
翻译:我们提出了一种在训练过程中动态增长深度网络架构的方法,该方法基于精确性与稀疏性目标的有机结合。不同于现有对完整模型或超网络架构进行剪枝或架构搜索的技术,我们的方法能够从小型、简单的种子架构出发,动态地增长和剪枝层和滤波器。通过将离散网络结构优化的连续松弛方法与子网络采样方案相结合,我们生成了紧凑的剪枝网络,同时大幅降低了训练的计算开销。例如,在ImageNet数据集上,与基线ResNet-50相比,我们实现了推理FLOPs降低49.7%、训练FLOPs降低47.4%,同时保持了75.2%的Top-1准确率——全程无需专门的微调阶段。在CIFAR、ImageNet、PASCAL VOC和Penn Treebank上的实验表明,无论是用于图像分类和语义分割的卷积网络,还是用于语言建模的循环网络,我们的方法在训练速度与网络效率方面均优于同类架构剪枝或搜索方法。