Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these non-autoregressive (NAR) models still struggle to generate high-quality texts compared with autoregressive (AR) models. Considering that encoder-based models have the advantage of efficient generation and self-correction abilities, this paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. Specifically, we start from a multilingual encoder (XLM-R) and propose a \textbf{S}emantic-\textbf{G}uided \textbf{A}lignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-R$_{large}$. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators.
翻译:基于掩码语言建模预训练的多语言理解模型(或基于编码器的模型)已在众多语言理解任务上取得显著成果(如mBERT)。然而,与自回归模型相比,这些非自回归模型在生成高质量文本方面仍存在困难。考虑到基于编码器的模型具有高效生成和自校正能力的优势,本文探索如何赋予多语言理解模型生成能力,从而获得统一模型。具体而言,我们以多语言编码器XLM-R为基础,提出一种语义引导的对齐-去噪方法,通过少量新增参数将编码器适配为多语言生成模型。实验表明,该方法是一种有效的适配策略,在XLM-Rlarge上,机器翻译任务取得9.4 BLEU的增益,问题生成任务取得8.1 Rouge-L的增益,故事生成任务取得5.5 METEOR的增益,显著优于广泛使用的基于初始化的方法。另一方面,我们观察到尽管XLM-R在零样本设置下表现更优,但在有监督场景中仍逊于mBART,这表明要使理解模型成为强生成模型仍需进一步探索。