On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting. However, there is a tradeoff between the number of learned parameters and the performance, making such models computationally expensive. In this work, we aim to reduce this cost while maintaining competitive performance. We achieve this by revisiting and extending a simple transfer learning idea: learning task-specific normalization layers. Specifically, we tune the scale and bias parameters of LayerNorm for each continual learning task, selecting them at inference time based on the similarity between task-specific keys and the output of the pre-trained model. To make the classifier robust to incorrect selection of parameters during inference, we introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time. Experiments on ImageNet-R and CIFAR-100 show that our method achieves results that are either superior or on par with {the state of the art} while being computationally cheaper.

翻译：当前最先进的无重放持续学习方法利用视觉Transformer的特性学习任务特定提示，显著减少了灾难性遗忘。然而，这些方法在学习参数数量与性能之间存在权衡，导致计算成本高昂。本研究旨在降低这一成本的同时保持竞争性性能。我们通过重新审视并扩展一个简单的迁移学习思路来实现这一目标：学习任务特定的归一化层。具体而言，我们针对每个持续学习任务调整层归一化的缩放和偏置参数，并在推理时基于任务特定键与预训练模型输出的相似性进行参数选择。为增强分类器对推理时参数错误选择的鲁棒性，我们引入两阶段训练流程：首先优化任务特定参数，然后采用与推理时相同的选择机制训练分类器。在ImageNet-R和CIFAR-100上的实验表明，我们的方法在计算成本更低的情况下取得了优于或持平当前最先进水平的结果。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日