Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

翻译：微调大型预训练模型的全部参数已成为迁移学习的主流方法。为提升其效率并防止灾难性遗忘和干扰，研究者们开发了适配器与稀疏微调等技术。适配器具有模块化特性，可通过组合方式使模型适应不同知识维度（例如专用的语言和/或任务适配器）。稀疏微调则具有强表达能力，能够调控所有模型组件的行为。本文提出一种兼具上述两种理想特性的新型微调方法。具体而言，我们基于彩票假设的简单变体学习稀疏实值掩码：任务特异性掩码通过源语言的标注数据获得，语言特异性掩码则通过目标语言的掩码语言建模获得。这两种掩码可与预训练模型进行组合。与基于适配器的微调不同，本方法既不会增加推理时的参数数量，也不会改变原始模型架构。最重要的是，在包括Universal Dependencies、MasakhaNER和AmericansNLI等一系列多语言基准测试中，本方法在零样本跨语言迁移任务上大幅超越适配器方法。基于深度分析，我们还发现稀疏性对于防止以下两方面问题至关重要：1）待组合微调间的相互干扰；2）过拟合。代码与模型已开源至https://github.com/cambridgeltl/composable-sft。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日