As neural networks grow in scale, their training becomes both computationally demanding and rich in dynamics. Amidst the flourishing interest in these training dynamics, we present a novel observation: Parameters during training exhibit intrinsic correlations over time. Capitalizing on this, we introduce Correlation Mode Decomposition (CMD). This algorithm clusters the parameter space into groups, termed modes, that display synchronized behavior across epochs. This enables CMD to efficiently represent the training dynamics of complex networks, like ResNets and Transformers, using only a few modes. Moreover, test set generalization is enhanced. We introduce an efficient CMD variant, designed to run concurrently with training. Our experiments indicate that CMD surpasses the state-of-the-art method for compactly modeled dynamics on image classification. Our modeling can improve training efficiency and lower communication overhead, as shown by our preliminary experiments in the context of federated learning.
翻译:随着神经网络规模的增长,其训练过程在计算上变得愈发繁重,且呈现出丰富的动态特性。针对这些训练动态日益增长的研究兴趣,我们提出了一项新发现:训练过程中的参数随时间呈现出内在的相关性。基于此,我们引入了相关模态分解(CMD)方法。该算法将参数空间聚类为若干组(称为模态),这些模态在不同训练周期中表现出同步行为。这使得CMD能够仅用少量模态便高效表征复杂网络(如ResNet和Transformer)的训练动态。此外,该方法还提升了测试集的泛化能力。我们设计了一种高效的CMD变体,可与训练过程并行运行。实验表明,在图像分类任务的紧凑建模动态方法中,CMD超越了当前最先进的技术。我们的建模能够提升训练效率并降低通信开销——这一优势已在联邦学习的初步实验中得以验证。