Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory. The current common practice is to sub-sample frames from the video stream and store them in the replay memory. In this paper, we propose SMILE a novel replay mechanism for effective video continual learning based on individual/single frames. Through extensive experimentation, we show that under extreme memory constraints, video diversity plays a more significant role than temporal information. Therefore, our method focuses on learning from a small number of frames that represent a large number of unique videos. On three representative video datasets, Kinetics, UCF101, and ActivityNet, the proposed method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.
翻译:类别增量学习是持续学习研究中最重要的设定之一,因其高度接近真实应用场景。在记忆容量受限的情况下,随着类别/任务数量的增加,灾难性遗忘问题随之出现。在视频领域开展持续学习研究更具挑战性,因为视频数据包含大量帧,这给重放记忆带来了更高负担。当前普遍做法是从视频流中进行帧子采样并将其存储到重放记忆中。本文提出SMILE——一种基于单帧的高效视频持续学习新型重放机制。通过大量实验表明,在极端记忆约束条件下,视频多样性比时序信息发挥更重要的作用。因此,我们的方法专注于从代表大量独特视频的少量帧中进行学习。在Kinetics、UCF101和ActivityNet三个代表性视频数据集上,所提方法实现了最先进的性能,较此前最优方法最高提升21.49%。