Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. In such settings, although recent data is more indicative of future data, naively prioritizing recent instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary scorer model that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins.
翻译:缓慢的概念漂移是实际机器学习系统中普遍存在但尚未被充分研究的问题。在此类场景中,尽管近期数据更能预示未来数据,但单纯优先处理近期实例存在丢失过去有价值信息的风险。我们提出了一种基于优化的方法,用于在较大训练窗口内平衡实例重要性。首先,我们利用多个衰减时间尺度的混合模型来刻画实例相关性,从而捕获丰富的时变趋势。其次,我们学习一个辅助评分器模型,该模型能够根据实例自身特征恢复合适的时间尺度混合模式。最后,我们提出一种嵌套优化目标来训练该评分器,使其能够最大化已学习模型的正向迁移效果。在包含9年间3900万张照片的大规模真实数据集上的实验表明,相较于其他鲁棒学习基线方法,准确率最多提升15%。我们在两个非平稳学习真实数据集集合上复现了性能提升,并将工作扩展至持续学习场景,在此场景中同样以大幅度优势超越了当前最优方法。