Towards Unifying Multi-Lingual and Cross-Lingual Summarization

To adapt text summarization to the multilingual world, previous work proposes multi-lingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the different definitions, which limits the compatible and systematic research on both of them. In this paper, we aim to unify MLS and CLS into a more general setting, i.e., many-to-many summarization (M2MS), where a single model could process documents in any language and generate their summaries also in any language. As the first step towards M2MS, we conduct preliminary studies to show that M2MS can better transfer task knowledge across different languages than MLS and CLS. Furthermore, we propose Pisces, a pre-trained M2MS model that learns language modeling, cross-lingual ability and summarization ability via three-stage pre-training. Experimental results indicate that our Pisces significantly outperforms the state-of-the-art baselines, especially in the zero-shot directions, where there is no training data from the source-language documents to the target-language summaries.

翻译：为适应多语言世界中的文本摘要任务，现有研究提出了多语言摘要和跨语言摘要两种范式。然而，由于定义差异，这两个任务长期被独立研究，限制了两者兼容且系统化的探索。本文旨在将多语言摘要与跨语言摘要统一为更通用的设定，即多对多摘要（M2MS），使单一模型能够处理任意语言文档并生成任意语言的摘要。作为迈向M2MS的第一步，我们通过初步实验证明，相较于多语言摘要和跨语言摘要，M2MS能更有效地在不同语言间迁移任务知识。进一步地，我们提出Pisces——一种通过三阶段预训练学习语言建模、跨语言能力及摘要能力的预训练M2MS模型。实验结果表明，我们的Pisces显著优于现有最优基线模型，尤其在零样本方向（即不存在从源语言文档到目标语言摘要的训练数据时）表现突出。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

加速图神经网络推理，121页ppt，普林斯顿大学JAVIER DUARTE主讲

专知会员服务

33+阅读 · 2022年6月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日