DIET: Learning to Distill Dataset Continually for Recommender Systems

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model development. This challenge calls for data-efficient approaches that can faithfully approximate full-data training behavior without repeatedly processing the entire evolving data stream. We formulate this problem as \emph{streaming dataset distillation for recommender systems} and propose \textbf{DIET}, a unified framework that maintains a compact distilled dataset which evolves alongside streaming data while preserving training-critical signals. Unlike existing dataset distillation methods that construct a static distilled set, DIET models distilled data as an evolving training memory and updates it in a stage-wise manner to remain aligned with long-term training dynamics. DIET enables effective continual distillation through principled initialization from influential samples and selective updates guided by influence-aware memory addressing within a bi-level optimization framework. Experiments on large-scale recommendation benchmarks demonstrate that DIET compresses training data to as little as \textbf{1-2\%} of the original size while preserving performance trends consistent with full-data training, reducing model iteration cost by up to \textbf{60$\times$}. Moreover, the distilled datasets produced by DIET generalize well across different model architectures, highlighting streaming dataset distillation as a scalable and reusable data foundation for recommender system development.

翻译：现代深度推荐模型在持续学习范式下训练，依赖海量且不断增长的流式行为日志。在大型平台中，基于全部历史数据重新训练模型以进行架构比较或迭代的成本极其高昂，严重拖慢了模型开发速度。这一挑战催生了数据高效方法的需求：能够忠实逼近全数据训练行为，而无需重复处理整个演化的数据流。我们将此问题形式化为“推荐系统的流式数据集蒸馏”，并提出**DIET**——一个统一框架，用于维护一个紧凑的蒸馏数据集，该数据集随流式数据演化，同时保留训练关键信号。与构建静态蒸馏集的现有数据集蒸馏方法不同，DIET将蒸馏数据建模为不断演化的训练记忆，并以阶段式方式更新，使其与长期训练动态保持一致。DIET通过基于影响力样本的原则性初始化，以及在双层优化框架内由影响力感知的存储器寻址引导的选择性更新，实现了有效的持续蒸馏。在大规模推荐基准上的实验表明，DIET可将训练数据压缩至原始规模的**1-2%**，同时保持与全数据训练一致的性能趋势，将模型迭代成本降低高达**60倍**。此外，DIET生成的蒸馏数据集在不同模型架构间具有良好的泛化能力，凸显了流式数据集蒸馏作为推荐系统开发中可扩展且可复用的数据基础的价值。