Many real-world applications require machine-learning models to be able to deal with non-stationary data distributions and thus learn autonomously over an extended period of time, often in an online setting. One of the main challenges in this scenario is the so-called catastrophic forgetting (CF) for which the learning model tends to focus on the most recent tasks while experiencing predictive degradation on older ones. In the online setting, the most effective solutions employ a fixed-size memory buffer to store old samples used for replay when training on new tasks. Many approaches have been presented to tackle this problem. However, it is not clear how predictive uncertainty information for memory management can be leveraged in the most effective manner and conflicting strategies are proposed to populate the memory. Are the easiest-to-forget or the easiest-to-remember samples more effective in combating CF? Starting from the intuition that predictive uncertainty provides an idea of the samples' location in the decision space, this work presents an in-depth analysis of different uncertainty estimates and strategies for populating the memory. The investigation provides a better understanding of the characteristics data points should have for alleviating CF. Then, we propose an alternative method for estimating predictive uncertainty via the generalised variance induced by the negative log-likelihood. Finally, we demonstrate that the use of predictive uncertainty measures helps in reducing CF in different settings.
翻译:许多现实应用要求机器学习模型能够处理非平稳数据分布,从而在长时间内(通常以在线方式)进行自主学习。在此场景中,主要挑战之一是所谓的灾难性遗忘(CF),即学习模型倾向于聚焦于最新任务,而对旧任务的预测性能出现退化。在线设置中,最有效的解决方案采用固定大小的记忆缓冲区来存储旧样本,用于训练新任务时进行回放。已有多种方法被提出以解决此问题。然而,目前尚不清楚如何以最有效的方式利用预测不确定性信息进行记忆管理,且现有研究提出了相互冲突的记忆填充策略。究竟是最易遗忘的样本还是最易记忆的样本对缓解CF更有效?基于预测不确定性能够反映样本在决策空间中位置这一基本直觉,本研究对不同不确定性估计方法及记忆填充策略进行了深入分析。该研究深化了我们对缓解CF所需数据点特征的理解。随后,我们提出了一种通过负对数似然诱导的广义方差来估计预测不确定性的替代方法。最后,我们证明了在不同设置中使用预测不确定性度量有助于减轻CF。