MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and online courses for acquiring information is extremely inefficient. Even after ASR systems transcribe recordings into long-form spoken language documents, reading ASR transcripts only partly speeds up seeking information. It has been observed that a range of NLP applications, such as keyphrase extraction, topic segmentation, and summarization, significantly improve users' efficiency in grasping important information. The meeting scenario is among the most valuable scenarios for deploying these spoken language processing (SLP) capabilities. However, the lack of large-scale public meeting datasets annotated for these SLP tasks severely hinders their advancement. To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection. To facilitate the MUG benchmark, we construct and release a large-scale meeting dataset for comprehensive long-form SLP development, the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage, with manual annotations for SLP tasks on manual transcripts of meeting recordings. To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.

翻译：通过视频会议和在线课程获取长视频/音频信息效率极低。即使自动语音识别（ASR）系统将录音转录为长篇口语文档，阅读ASR转录文本也只能部分加快信息检索速度。研究表明，关键短语提取、主题分割、摘要生成等一系列自然语言处理（NLP）应用显著提升了用户掌握重要信息的效率。会议场景是部署这些口语语言处理（SLP）能力最有价值的场景之一。然而，缺乏为这些SLP任务标注的大规模公开会议数据集严重阻碍了其发展。为推动SLP进步，我们构建了大规模通用会议理解与生成基准（MUG），用于评估包括主题分割、主题级和会话级抽取式摘要生成、主题标题生成、关键短语提取以及行动项检测在内的多项SLP任务性能。为支持MUG基准，我们构建并发布了用于全面长语音SLP开发的大规模会议数据集——AliMeeting4MUG语料库，该语料库包含654场已录音的中文会议会话，覆盖多样化主题领域，并基于会议录音人工转录文本为SLP任务提供人工标注。据我们所知，AliMeeting4MUG语料库是目前规模最大的会议语料库，且支持最多SLP任务。本文详细介绍了该语料库、SLP任务与评估方法、基线系统及其性能。