Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages. Results show that SkillNet-X performs better than task-specific baselines and two multitask learning baselines (i.e., dense joint model and Mixture-of-Experts model). Furthermore, skill pre-training further improves the performance of SkillNet-X on almost all datasets. To investigate the generalization of our model, we conduct experiments on two new tasks and find that SkillNet-X significantly outperforms baselines.
翻译:传统多任务学习方法基本只能在任务层面或语言层面利用通用知识,从而损失了跨语言或跨任务知识。本文提出一种通用的多语言多任务模型——SkillNet-X,使单一模型能够处理来自不同语言的众多不同任务。为此,我们定义了若干语言特定技能和任务特定技能,每种技能对应一个技能模块。SkillNet-X稀疏地激活与目标任务或目标语言相关的部分技能模块。技能模块作为知识传输枢纽,能够连续吸收任务相关知识和语言相关知识。基于Transformer架构,我们修改了多头注意力层和前馈网络层以适配技能模块。我们在四种语言的十一个自然语言理解数据集上评估了SkillNet-X。结果表明,SkillNet-X表现优于任务特定基线模型以及两种多任务学习基线模型(即密集联合模型和专家混合模型)。此外,技能预训练在几乎所有数据集上进一步提升了SkillNet-X的性能。为探究模型的泛化能力,我们在两个新任务上进行了实验,发现SkillNet-X显著优于基线模型。