The problem of class incremental learning (CIL) is considered. State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. While effective from a computational standpoint, these methods lead to models that grow quickly with the number of tasks. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. This is accomplished by the introduction of dense connections between the intermediate layers of the task expert networks, that enable the transfer of knowledge from old to new tasks via feature sharing and reusing. This sharing is implemented with a cross-task attention mechanism, based on a new task attention block (TAB), that fuses information across tasks. Unlike traditional attention mechanisms, TAB operates at the level of the feature mixing and is decoupled with spatial attentions. This is shown more effective than a joint spatial-and-task attention for CIL. The proposed DNE approach can strictly maintain the feature space of old classes while growing the network and feature scale at a much slower rate than previous methods. In result, it outperforms the previous SOTA methods by a margin of 4\% in terms of accuracy, with similar or even smaller model scale.
翻译:考虑类别增量学习(CIL)问题。现有最优方法采用基于网络扩展(NE)的动态架构,即为每个任务添加一个任务专家网络。虽然从计算角度效果显著,但这些方法会导致模型规模随任务数量快速增长。本文提出一种新的网络扩展方法——密集网络扩展(DNE),旨在实现准确率与模型复杂度之间的更优平衡。该方法通过在任务专家网络中间层之间引入密集连接,利用特征共享与复用实现从旧任务到新任务的知识迁移。该共享机制基于新型任务注意力模块(TAB)构建的跨任务注意力机制实现跨任务信息融合。与传统的注意力机制不同,TAB在特征混合层面运作,并与空间注意力解耦。实验证明,针对CIL问题,该机制比联合空间-任务注意力更有效。所提出的DNE方法能够在保持旧类别特征空间的同时,以远低于先前方法的速率扩展网络与特征尺度。结果表明,该方法在模型规模相当甚至更小的前提下,准确率相较先前最先进方法提升4%。