Understanding the training dynamics of deep neural networks (DNNs) is important as it can lead to improved training efficiency and task performance. Recent works have demonstrated that representing the wirings of static graph cannot capture how DNNs change over the course of training. Thus, in this work, we propose a compact, expressive temporal graph framework that effectively captures the dynamics of many workhorse architectures in computer vision. Specifically, it extracts an informative summary of graph properties (e.g., eigenvector centrality) over a sequence of DNN graphs obtained during training. We demonstrate that our framework captures useful dynamics by accurately predicting trained, task performance when using a summary over early training epochs (<5) across four different architectures and two image datasets. Moreover, by using a novel, highly-scalable DNN graph representation, we also show that the proposed framework captures generalizable dynamics as summaries extracted from smaller-width networks are effective when evaluated on larger widths.
翻译:理解深度神经网络(DNN)的训练动态至关重要,因为这有助于提高训练效率和任务性能。近期研究表明,表示静态图连接无法捕捉DNN在训练过程中的变化。因此,本文提出一种紧凑且富有表达力的时序图框架,能够有效捕捉计算机视觉领域中多种主流架构的动态特征。具体而言,该框架从训练期间获得的DNN图序列中提取图属性(如特征向量中心性)的信息性摘要。我们通过实验证明,该框架在四个不同架构和两个图像数据集上,利用早期训练轮次(<5轮)的摘要即可准确预测训练后的任务性能,从而验证了其捕捉有用动态的能力。此外,通过采用一种新颖且高度可扩展的DNN图表示,我们还展示了所提框架能够捕捉泛化性动态,即从较窄宽度网络中提取的摘要对较宽网络评估时同样有效。