Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved the performance of abstractive summarization systems. The majority of research has focused on written documents, however, neglecting the problem of multi-party dialogue summarization. In this paper, we present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization. Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives. We highlight the importance of high quality transcription and annotations for training accurate and effective dialogue summarization models, and emphasize the need for multilingual resources to support dialogue summarization in non-English languages. We also provide baseline experiments using state-of-the-art methods, and encourage further research in this area to advance the field of dialogue summarization. Our dataset will be made publicly available for use by the research community.
翻译:近年来,深度学习领域的进展,特别是编码器-解码器架构的发明,显著提升了抽象式摘要系统的性能。然而,多数研究聚焦于书面文档,忽视了多方对话摘要这一议题。本文提出一个法语政治辩论数据集,旨在丰富多语言对话摘要资源。该数据集包含经人工转录与标注的政治辩论,覆盖多种议题与观点。我们强调高质量转录与标注对训练准确有效的对话摘要模型的重要性,并指出需构建多语言资源以支持非英语语言的对话摘要研究。同时,基于前沿方法开展基线实验,并鼓励在该领域开展进一步研究以推动对话摘要技术的发展。本数据集将向研究社区公开发布。