To adapt text summarization to the multilingual world, previous work proposes multi-lingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the different definitions, which limits the compatible and systematic research on both of them. In this paper, we aim to unify MLS and CLS into a more general setting, i.e., many-to-many summarization (M2MS), where a single model could process documents in any language and generate their summaries also in any language. As the first step towards M2MS, we conduct preliminary studies to show that M2MS can better transfer task knowledge across different languages than MLS and CLS. Furthermore, we propose Pisces, a pre-trained M2MS model that learns language modeling, cross-lingual ability and summarization ability via three-stage pre-training. Experimental results indicate that our Pisces significantly outperforms the state-of-the-art baselines, especially in the zero-shot directions, where there is no training data from the source-language documents to the target-language summaries.
翻译:为适应多语言世界中的文本摘要任务,现有研究提出了多语言摘要和跨语言摘要两种范式。然而,由于定义差异,这两个任务长期被独立研究,限制了两者兼容且系统化的探索。本文旨在将多语言摘要与跨语言摘要统一为更通用的设定,即多对多摘要(M2MS),使单一模型能够处理任意语言文档并生成任意语言的摘要。作为迈向M2MS的第一步,我们通过初步实验证明,相较于多语言摘要和跨语言摘要,M2MS能更有效地在不同语言间迁移任务知识。进一步地,我们提出Pisces——一种通过三阶段预训练学习语言建模、跨语言能力及摘要能力的预训练M2MS模型。实验结果表明,我们的Pisces显著优于现有最优基线模型,尤其在零样本方向(即不存在从源语言文档到目标语言摘要的训练数据时)表现突出。