Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer. Extensive experiments are conducted to demonstrate the effectiveness of AdapterBias. The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained models. We further find that AdapterBias automatically learns to assign more significant representation shifts to the tokens related to the task in consideration.
翻译:基于Transformer的预训练模型包含数百万参数,需要大量存储空间。近期方法通过训练适配器来缓解这一缺陷,但这些方法仍需相对较多的参数。本研究提出了一种极其简洁而高效的适配器架构AdapterBias。AdapterBias通过对Transformer层隐藏层输出施加与令牌相关的偏移,仅需一个向量和一个线性层即可适配下游任务。我们通过大量实验验证了AdapterBias的有效性。实验表明,与先前方法相比,本文提出的方法能显著减少可训练参数,同时相较于微调后的预训练模型,任务性能仅出现极小降幅。进一步研究发现,AdapterBias能自动学习为与当前任务相关的令牌分配更显著的表示偏移。