Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HyperInterval, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HyperInterval maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and can be seen as a meta-trainer. HyperInterval obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
翻译:最近,一种新的持续学习(CL)范式被提出以控制灾难性遗忘,称为区间持续学习(InterContiNet),其依赖于在神经网络参数空间上施加区间约束。然而,由于权重空间的高维性,区间难以管理,使得InterContiNet的训练具有挑战性。为解决这一问题,我们提出了HyperInterval,该技术在嵌入空间内采用区间算术,并利用超网络将这些区间映射到目标网络参数空间。我们为连续任务训练区间嵌入,并训练一个超网络将这些嵌入转换为目标网络的权重。给定任务的嵌入与超网络一同训练,以保持目标网络对先前任务嵌入的响应。区间算术在更易管理的低维嵌入空间中操作,而非直接在髙维权重空间中构建区间。我们的模型实现了更快、更高效的训练。此外,HyperInterval保持了不遗忘的保证。训练结束时,我们可以选择一个通用嵌入来生成一个适用于所有任务的单一网络。在此框架中,超网络仅用于训练,可视为元训练器。HyperInterval取得了显著优于InterContiNet的结果,并在多个基准测试中达到了最先进(SOTA)水平。