Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.
翻译:尽管深度学习近年来取得了巨大进展,但训练神经网络所激增的经济与环境成本正变得不可持续。为解决这一问题,大量研究聚焦于"算法高效深度学习",其目标并非在硬件或实现层面,而是通过改变训练程序的语义来降低训练成本。本文对该领域的研究进行了结构化、全面的综述。首先,我们形式化定义了"算法加速"问题,进而利用算法高效训练的基本构建模块构建分类体系。该分类体系揭示了看似不同方法间的共性,并指出了当前研究空白。其次,我们提出评估最佳实践,以促进对加速技术的全面、公平且可靠的比较。为助力研究与应用,我们讨论了训练流水线中的常见瓶颈(通过实验阐明),并提供了针对性的分类缓解策略。最后,我们指出了若干未解决的研究挑战,并展望了有前景的未来方向。