The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data with a synthetic dataset. However, our investigation found that this method suffers from three significant limitations: 1. Instability of expert trajectory generated by Stochastic Gradient Descent (SGD); 2. Low convergence speed of the distillation process; 3. High storage consumption of the expert trajectory. To address these issues, we offer a new perspective on understanding the essence of Dataset Distillation and MTT through a simple transformation of the objective function, and introduce a novel method called Matching Convexified Trajectory (MCT), which aims to provide better guidance for the student trajectory. MCT leverages insights from the linearized dynamics of Neural Tangent Kernel methods to create a convex combination of expert trajectories, guiding the student network to converge rapidly and stably. This trajectory is not only easier to store, but also enables a continuous sampling strategy during distillation, ensuring thorough learning and fitting of the entire expert trajectory. Comprehensive experiments across three public datasets validate the superiority of MCT over traditional MTT methods.
翻译:深度学习和大型语言模型的快速发展导致对训练数据的需求呈指数级增长,这促使了数据集蒸馏方法的发展,以应对管理大型数据集的挑战。其中,匹配训练轨迹(MTT)一直是一种主流方法,它通过合成数据集复现专家网络在真实数据上的训练轨迹。然而,我们的研究发现该方法存在三个显著局限:1. 由随机梯度下降(SGD)生成的专家轨迹不稳定;2. 蒸馏过程的收敛速度慢;3. 专家轨迹的存储消耗高。为解决这些问题,我们通过目标函数的一个简单变换,提供了理解数据集蒸馏和MTT本质的新视角,并引入了一种称为匹配凸化轨迹(MCT)的新方法,旨在为学生轨迹提供更好的指导。MCT借鉴了神经正切核方法线性化动力学的洞见,创建专家轨迹的凸组合,从而引导学生网络快速且稳定地收敛。该轨迹不仅更易于存储,还支持在蒸馏过程中采用连续采样策略,确保对整个专家轨迹进行全面学习和拟合。在三个公开数据集上的综合实验验证了MCT相较于传统MTT方法的优越性。