We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. This is compounded by the crucial need to mitigate catastrophic forgetting, particularly given the limited access to past datasets, which complicates maintaining correspondence between network parameters and datasets across all sessions. Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates and coupled disruptions in certain nodes. To address these challenges, we propose the following strategies. To reduce the storage of the dense layer parameters, we propose a parameter distribution learning method that significantly reduces the storage requirements. In the continual learning framework employing variational inference, our study introduces a regularization term that specifically targets the dynamics and population of the mean and variance of the parameters. This term aims to retain the benefits of KL divergence while addressing related challenges. To ensure proper correspondence between network parameters and the data, our method introduces an importance-weighted Evidence Lower Bound term to capture data and parameter correlations. This enables storage of common and distinctive parameter hyperspace bases. The proposed method partitions the parameter space into common and distinctive subspaces, with conditions for effective backward and forward knowledge transfer, elucidating the network-parameter dataset correspondence. The experimental results demonstrate the effectiveness of our method across diverse datasets and various combinations of sequential datasets, yielding superior performance compared to existing approaches.
翻译:我们提出了一种基于变分推理的贝叶斯神经网络持续学习算法,旨在克服现有方法的若干缺陷。具体而言,在持续学习场景中,为保留知识而存储每一步的网络参数面临挑战。这一问题因缓解灾难性遗忘的迫切需求而加剧,特别是在对历史数据集访问受限的情况下,这给维持所有会话中网络参数与数据集间的对应关系带来了困难。当前采用KL散度的变分推理方法在不确定节点更新时存在灾难性遗忘风险,并可能导致特定节点的耦合性干扰。为解决这些挑战,我们提出以下策略:为减少全连接层参数的存储需求,我们提出一种能显著降低存储要求的参数分布学习方法;在采用变分推理的持续学习框架中,本研究引入专门针对参数均值和方差动态特性及分布的修正正则化项,该正则项旨在保留KL散度优势的同时解决相关挑战;为确保网络参数与数据间的正确对应,我们的方法引入重要性加权的证据下界项以捕捉数据与参数的相关性,从而实现公共与特异性参数超空间基向量的存储。所提方法将参数空间划分为公共子空间和特异性子空间,并建立有效后向与前向知识迁移的条件,从而阐明网络参数与数据集的对应机制。实验结果表明,我们的方法在多样化数据集及多种序列数据集组合上均表现出有效性,其性能优于现有方法。