Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft.

翻译：克服灾难性遗忘的研究主要集中于开发更有效的持续学习方法，而较少关注网络架构设计（如网络深度、宽度及组件）对持续学习的贡献。本文旨在弥合网络架构设计与持续学习之间的这一鸿沟，并对网络架构对持续学习的影响进行全面研究。本研究从网络缩放层面（即宽度与深度）以及网络组件层面（即跳跃连接、全局池化层和下采样）考察架构设计。在这两种情况下，我们首先通过系统探索架构设计如何影响持续学习来获得洞见。基于这些洞见，我们进一步为持续学习设计专用搜索空间，并提出一种简单而有效的ArchCraft方法，用以引导生成利于持续学习的架构——具体而言，该方法将AlexNet/ResNet重构为AlexAC/ResAC。在多种持续学习设置与场景下的实验验证表明，改进后的架构具有参数高效性，在任务持续学习与类别持续学习场景中，以比朴素持续学习架构紧凑86%、61%和97%的参数规模，实现了最先进的持续学习性能。代码已开源至https://github.com/byyx666/ArchCraft。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日