Sporthesia: Augmenting Sports Videos Using Natural Language

Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated using natural language, such as in commentaries, oral presentations, and articles, but usually lack visual cues. Thus, this work aims to facilitate the creation of augmented sports videos by enabling analysts to directly create visualizations embedded in videos using insights expressed in natural language. To achieve this goal, we propose a three-step approach - 1) detecting visualizable entities in the text, 2) mapping these entities into visualizations, and 3) scheduling these visualizations to play with the video - and analyzed 155 sports video clips and the accompanying commentaries for accomplishing these steps. Informed by our analysis, we have designed and implemented Sporthesia, a proof-of-concept system that takes racket-based sports videos and textual commentaries as the input and outputs augmented videos. We demonstrate Sporthesia's applicability in two exemplar scenarios, i.e., authoring augmented sports videos using text and augmenting historical sports videos based on auditory comments. A technical evaluation shows that Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable entities in the text. An expert evaluation with eight sports analysts suggests high utility, effectiveness, and satisfaction with our language-driven authoring method and provides insights for future improvement and opportunities.

翻译：增强体育视频将可视化与视频特效相结合，在真实场景中呈现数据，能生动传达洞察，因此日益受到全球体育爱好者的青睐。然而，制作增强体育视频仍是一项艰巨任务，需要大量时间和视频编辑技能。另一方面，体育洞察常通过自然语言（如解说、口头报告和文章）传达，但通常缺乏视觉元素。为此，本研究旨在通过让分析师直接用自然语言表达洞察，在视频中嵌入可视化内容，从而简化增强体育视频的创作过程。为实现这一目标，我们提出三步法：（1）检测文本中可可视化的实体；（2）将这些实体映射为可视化元素；（3）安排这些元素与视频同步播放，并分析了155个体育视频片段及配套解说以完成这些步骤。基于分析结果，我们设计并实现了Sporthesia概念验证系统，该系统以球拍类体育视频和文本解说为输入，输出增强视频。我们通过两个示例场景展示了Sporthesia的适用性：利用文本创作增强体育视频，以及基于音频评论对历史体育视频进行增强。技术评估表明，Sporthesia在检测文本中可可视化实体方面达到高准确率（F1分数为0.9）。八位体育分析师的专家评估显示，我们的语言驱动创作方法具有高实用性、有效性和满意度，并为未来改进和拓展提供了洞见。