Training deep neural networks (DNNs) is time-consuming. While most existing solutions try to overlap/schedule computation and communication for efficient training, this paper goes one step further by skipping computing and communication through DNN layer freezing. Our key insight is that the training progress of internal DNN layers differs significantly, and front layers often become well-trained much earlier than deep layers. To explore this, we first introduce the notion of training plasticity to quantify the training progress of internal DNN layers. Then we design Egeria, a knowledge-guided DNN training system that employs semantic knowledge from a reference model to accurately evaluate individual layers' training plasticity and safely freeze the converged ones, saving their corresponding backward computation and communication. Our reference model is generated on the fly using quantization techniques and runs forward operations asynchronously on available CPUs to minimize the overhead. In addition, Egeria caches the intermediate outputs of the frozen layers with prefetching to further skip the forward computation. Our implementation and testbed experiments with popular vision and language models show that Egeria achieves 19%-43% training speedup w.r.t. the state-of-the-art without sacrificing accuracy.
翻译:训练深度神经网络(DNN)耗时巨大。现有解决方案大多致力于通过重叠/调度计算与通信来提升训练效率,而本文则更进一步,通过DNN层冻结技术跳过计算与通信过程。我们的核心发现是:DNN内部各层的训练进程存在显著差异,前端层往往比深层更早完成充分训练。为探究这一现象,我们首先引入训练可塑性概念,用于量化DNN内部各层的训练进程。进而设计出Egeria——一种知识引导的DNN训练系统,它利用参考模型的语义知识精确评估各层的训练可塑性,并安全地冻结已收敛的层,从而省去相应的反向计算与通信开销。我们的参考模型通过量化技术即时生成,并在可用CPU上异步执行前向运算,以最小化额外开销。此外,Egeria通过预取缓存冻结层的中间输出结果,进一步跳过前向计算。基于流行视觉模型与语言模型的实现与测试实验表明,与现有最先进方法相比,Egeria在保持模型精度的前提下,实现了19%-43%的训练加速。