Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyse why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.
翻译:持续学习(CL)的研究主要聚焦于动态变化的数据分布所引发的问题。然而,CL可分解为两个子问题:(a)数据分布的偏移,以及(b)数据被分割成多个分块,因此任何时刻只能训练部分数据。本文关注后者——数据分块问题——并注意到现有CL文献中关于分块的分析较为稀少。我们证明分块是CL的重要组成部分,在我们的实验中,它大约解释了从离线学习到CL性能下降的一半原因。此外,我们的结果表明,当前CL算法并未解决分块子问题,当数据分布无偏移时,其表现仅与普通SGD训练相当。我们分析了在分块数据上学习时性能下降的原因,发现遗忘(通常被视为分布偏移引起的问题)依然出现且构成显著挑战。受线性案例分析启发,我们证明分块权重平均可提升分块场景下的性能,且该性能可迁移至完整CL场景。因此,我们认为分块研究有助于推动CL领域的整体发展。