Adapters have been positioned as a parameter-efficient fine-tuning (PEFT) approach, whereby a minimal number of parameters are added to the model and fine-tuned. However, adapters have not been sufficiently analyzed to understand if PEFT translates to benefits in training/deployment efficiency and maintainability/extensibility. Through extensive experiments on many adapters, tasks, and languages in supervised and cross-lingual zero-shot settings, we clearly show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models. More precisely, adapters are relatively expensive to train and have slightly higher deployment latency. Furthermore, the maintainability/extensibility benefits of adapters can be achieved with simpler approaches like multi-task training via full fine-tuning, which also provide relatively faster training times. We, therefore, recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters. Our code is available at https://github.com/AI4Bharat/adapter-efficiency.
翻译:适配器被定位为一种参数高效微调(PEFT)方法,即在模型中添加最少数量的参数并进行微调。然而,目前尚缺乏对适配器的充分分析,以理解PEFT是否在训练/部署效率及可维护性/可扩展性方面带来优势。通过在监督和跨语言零样本设置下对大量适配器、任务和语言进行广泛实验,我们清晰地表明:对于自然语言理解(NLU)任务而言,适配器的参数效率并未转化为相比模型全微调的性能提升。具体来说,适配器的训练成本相对较高,且部署延迟略有增加。此外,适配器在可维护性/可扩展性方面的优势可通过更简单的方法实现,例如通过全微调进行多任务训练,这类方法还能提供相对更快的训练速度。因此,我们建议:对于中等规模的NLU任务模型,实践者应依赖全微调或多任务训练,而非使用适配器。我们的代码已开源在 https://github.com/AI4Bharat/adapter-efficiency。