Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data, various types of noise, and shifting conceptual objectives. This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets. We develop and evaluate a method of Continual Learning that leverages uncertainty quantification from Bayesian Inference to mitigate catastrophic forgetting. We expand on previous approaches by removing the need for Monte Carlo sampling of the model weights to sample the predictive distribution. We optimize a closed-form Evidence Lower Bound (ELBO) objective approximating the predictive distribution by propagating the first two moments of a distribution, i.e. mean and covariance, through all network layers. Catastrophic forgetting is mitigated by using the closed-form ELBO to approximate the Minimum Description Length (MDL) Principle, inherently penalizing changes in the model likelihood by minimizing the KL Divergence between the variational posterior for the current task and the previous task's variational posterior acting as the prior. Leveraging the approximation of the MDL principle, we aim to initially learn a sparse variational posterior and then minimize additional model complexity learned for subsequent tasks. Our approach is evaluated for the task incremental learning scenario using density propagated versions of fully-connected and convolutional neural networks across multiple sequential benchmark datasets with varying task sequence lengths. Ultimately, this procedure produces a minimally complex network over a series of tasks mitigating catastrophic forgetting.
翻译:深度神经网络(DNNs)在现实世界部署中经常面临分布外(OoD)数据、各类噪声以及不断变化的概念目标。本文提出一个框架来适应由基准持续学习数据集建模的数据分布漂移。我们开发并评估了一种利用贝叶斯推理中的不确定性量化来缓解灾难性遗忘的持续学习方法。通过去除对模型权重进行蒙特卡洛采样以获取预测分布的需求,我们在先前方法基础上进行了扩展。我们优化了一个封闭形式的证据下界(ELBO)目标,通过传播分布的前两阶矩(即均值和协方差)穿过所有网络层来近似预测分布。采用封闭形式ELBO近似最小描述长度(MDL)原理来缓解灾难性遗忘,通过最小化当前任务的变分后验与作为先验的前一任务变分后验之间的KL散度,固有地惩罚模型似然的变化。借助MDL原理的近似,我们旨在先学习一个稀疏变分后验,然后最小化为后续任务学习的额外模型复杂度。我们的方法在任务增量学习场景下进行评估,使用全连接网络和卷积神经网络的密度传播版本,在具有不同任务序列长度的多个连续基准数据集上进行测试。最终,该过程在系列任务中产生一个最小复杂度的网络,有效缓解了灾难性遗忘。