Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters. We propose Cross-lingual Expert Language Models (X-ELM), which mitigate this competition by independently training language models on subsets of the multilingual corpus. This process specializes X-ELMs to different languages while remaining effective as a multilingual ensemble. Our experiments show that when given the same compute budget, X-ELM outperforms jointly trained multilingual models across all considered languages and that these gains transfer to downstream tasks. X-ELM provides additional benefits over performance improvements: new experts can be iteratively added, adapting X-ELM to new languages without catastrophic forgetting. Furthermore, training is asynchronous, reducing the hardware requirements for multilingual training and democratizing multilingual modeling.
翻译:尽管多语言语言模型在非英语自然语言处理中广受欢迎,但由于不同语言对模型参数的竞争,其性能往往不如单语言模型。我们提出跨语言专家语言模型(X-ELM),通过在多语言语料库子集上独立训练语言模型来缓解这一竞争。该过程使X-ELM模型专门化处理不同语言,同时保持作为多语言集成的有效性。实验表明,在相同计算预算下,X-ELM在所有考虑的语言上均优于联合训练的多语言模型,且这些性能提升可迁移至下游任务。除性能改进外,X-ELM还具备额外优势:可迭代添加新专家,使X-ELM适应新语言而不会发生灾难性遗忘。此外,训练过程是异步的,降低了多语言训练的硬件门槛,使多语言建模更具普适性。