MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and online courses for acquiring information is extremely inefficient. Even after ASR systems transcribe recordings into long-form spoken language documents, reading ASR transcripts only partly speeds up seeking information. It has been observed that a range of NLP applications, such as keyphrase extraction, topic segmentation, and summarization, significantly improve users' efficiency in grasping important information. The meeting scenario is among the most valuable scenarios for deploying these spoken language processing (SLP) capabilities. However, the lack of large-scale public meeting datasets annotated for these SLP tasks severely hinders their advancement. To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection. To facilitate the MUG benchmark, we construct and release a large-scale meeting dataset for comprehensive long-form SLP development, the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage, with manual annotations for SLP tasks on manual transcripts of meeting recordings. To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.

翻译：从视频会议和在线课程中获取信息的长时间视频/音频收听效率极低。即便自动语音识别（ASR）系统将录音转录为长篇口语文档，阅读ASR转录稿也仅能部分加速信息检索。已有研究表明，诸如关键词提取、主题分割和摘要生成等一系列自然语言处理（NLP）应用，能显著提升用户掌握关键信息的效率。会议场景是部署这些口语语言处理（SLP）能力最具价值的场景之一。然而，针对这些SLP任务标注的大规模公开会议数据集的缺失严重阻碍了其发展。为促进SLP技术进步，我们构建了一个大规模通用会议理解与生成基准（MUG），用于评估包括主题分割、主题级与会议级抽取式摘要生成、主题标题生成、关键词提取及行动项检测在内的多种SLP任务性能。为支撑MUG基准，我们构建并发布了面向全面长篇SLP开发的大规模会议数据集——AliMeeting4MUG语料库。该语料库包含654场已录音的中文会议，覆盖多样化主题，并基于会议录音的人工转录稿完成了SLP任务的手动标注。据我们所知，AliMeeting4MUG语料库是目前规模最大、覆盖SLP任务最全面的会议语料库。本文详细介绍了该语料库、SLP任务与评估方法、基线系统及其性能。