Soccer commentary plays a crucial role in enhancing the soccer game viewing experience for audiences. Previous studies in automatic soccer commentary generation typically adopt an end-to-end method to generate anonymous live text commentary. Such generated commentary is insufficient in the context of real-world live televised commentary, as it contains anonymous entities, context-dependent errors and lacks statistical insights of the game events. To bridge the gap, we propose GameSight, a two-stage model to address soccer commentary generation as a knowledge-enhanced visual reasoning task, enabling live-televised-like knowledgeable commentary with accurate reference to entities (players and teams). GameSight starts by performing visual reasoning to align anonymous entities with fine-grained visual and contextual analysis. Subsequently, the entity-aligned commentary is refined with knowledge by incorporating external historical statistics and iteratively updated internal game state information. Consequently, GameSight improves the player alignment accuracy by 18.5% on SN-Caption-test-align dataset compared to Gemini 2.5-pro. Combined with further knowledge enhancement, GameSight outperforms in segment-level accuracy and commentary quality, as well as game-level contextual relevance and structural composition. We believe that our work paves the way for a more informative and engaging human-centric experience with the AI sports application. Demo Page: https://gamesight2025.github.io/gamesight2025
翻译:足球解说是提升观众观赛体验的关键环节。现有自动足球解说生成研究通常采用端到端方法生成匿名直播文字解说。这类解说因包含匿名实体、依赖上下文的错误以及缺乏赛事事件的统计洞察,难以满足真实电视直播解说场景的需求。为此,我们提出GameSight——一种两阶段模型,将足球解说生成视为知识增强的视觉推理任务,从而生成类似电视直播的专业解说,并能准确指代实体(球员与球队)。GameSight首先执行视觉推理,通过细粒度视觉与上下文分析对齐匿名实体;随后,借助外部历史统计数据与迭代更新的内部比赛状态信息,对实体对齐后的解说进行知识增强。实验结果表明,在SN-Caption-test-align数据集上,GameSight的球员对齐准确率相较Gemini 2.5-pro提升18.5%。结合进一步的知识增强,GameSight在片段级准确性与解说质量、以及比赛级上下文相关性与结构组成方面均表现优越。我们相信,本工作为构建更富信息量且更具人类参与感的AI体育应用铺平了道路。演示页面: https://gamesight2025.github.io/gamesight2025