The dynamic nature of esports makes the situation relatively complicated for average viewers. Esports broadcasting involves game expert casters, but the caster-dependent game commentary is not enough to fully understand the game situation. It will be richer by including diverse multimodal esports information, including audiences' talks/emotions, game audio, and game match event information. This paper introduces GAME-MUG, a new multimodal game situation understanding and audience-engaged commentary generation dataset and its strong baseline. Our dataset is collected from 2020-2022 LOL game live streams from YouTube and Twitch, and includes multimodal esports game information, including text, audio, and time-series event logs, for detecting the game situation. In addition, we also propose a new audience conversation augmented commentary dataset by covering the game situation and audience conversation understanding, and introducing a robust joint multimodal dual learning model as a baseline. We examine the model's game situation/event understanding ability and commentary generation capability to show the effectiveness of the multimodal aspects coverage and the joint integration learning approach.
翻译:电子竞技的动态特性使得普通观众难以全面理解其复杂局势。电竞直播虽依赖于专业解说员,但仅凭解说员提供的赛事解说仍不足以完整把握游戏态势。通过整合观众言论与情感、游戏音频、赛事事件信息等多模态电竞数据,可极大丰富对游戏情境的认知。本文提出GAME-MUG数据集——一个全新的多模态游戏态势理解与观众参与式解说生成数据集及强基线模型。该数据集收集自2020-2022年YouTube与Twitch平台的《英雄联盟》游戏直播,涵盖文本、音频及时间序列事件日志等多模态电竞信息,用于检测游戏态势。此外,我们提出一种融合游戏态势与观众对话理解的增强型解说数据集,并引入鲁棒的联合多模态对偶学习模型作为基线方法。通过评估模型在游戏态势/事件理解能力及解说生成效果上的表现,验证了多模态信息覆盖与联合集成学习方法的有效性。