Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
翻译:机器学习模型的现实部署极具挑战性,因为数据会随时间演变。尽管任何模型都无法应对任意方式的数据演变,但如果这些变化存在某种模式,我们或许能够设计相应方法。本文聚焦于数据渐进演变的场景。我们提出一种时变倾向得分,能够检测数据分布的渐进偏移,从而有选择性地采样历史数据以更新模型——不仅包括与标准倾向得分中相似的过往数据,还包括过去以相似方式演变的数据。该时变倾向得分具有广泛适用性:我们展示了其多种实现方式,并在各类问题中进行了评估,涵盖数据经历渐进偏移序列的监督学习问题(如图像分类),以及当策略或任务变化导致数据偏移的强化学习任务(如机器人操作与连续控制)。