Sporthesia: Augmenting Sports Videos Using Natural Language

Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated using natural language, such as in commentaries, oral presentations, and articles, but usually lack visual cues. Thus, this work aims to facilitate the creation of augmented sports videos by enabling analysts to directly create visualizations embedded in videos using insights expressed in natural language. To achieve this goal, we propose a three-step approach - 1) detecting visualizable entities in the text, 2) mapping these entities into visualizations, and 3) scheduling these visualizations to play with the video - and analyzed 155 sports video clips and the accompanying commentaries for accomplishing these steps. Informed by our analysis, we have designed and implemented Sporthesia, a proof-of-concept system that takes racket-based sports videos and textual commentaries as the input and outputs augmented videos. We demonstrate Sporthesia's applicability in two exemplar scenarios, i.e., authoring augmented sports videos using text and augmenting historical sports videos based on auditory comments. A technical evaluation shows that Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable entities in the text. An expert evaluation with eight sports analysts suggests high utility, effectiveness, and satisfaction with our language-driven authoring method and provides insights for future improvement and opportunities.

翻译：增强型体育视频通过将可视化元素与视频效果相结合，在真实场景中呈现数据，能够以引人入胜的方式传递洞察信息，因此日益受到全球体育爱好者的欢迎。然而，制作增强型体育视频仍是一项具有挑战性的任务，需要耗费大量时间和视频编辑技能。另一方面，体育洞察通常通过自然语言（如解说、口头报告和文章）进行传达，但往往缺乏视觉线索。因此，本研究旨在通过使分析师能够直接使用自然语言表达的洞察信息，在视频中嵌入可视化内容，从而简化增强型体育视频的制作流程。为实现这一目标，我们提出了一种三步法：1）检测文本中可可视化的实体，2）将这些实体映射为可视化元素，3）安排这些可视化元素与视频同步播放，并通过分析155个体育视频片段及其配套解说来完成这些步骤。基于分析结果，我们设计并实现了Sporthesia这一概念验证系统，该系统以球拍类体育视频和文本解说为输入，输出增强型视频。我们通过两个典型场景展示了Sporthesia的适用性：利用文本制作增强型体育视频，以及基于音频评论增强历史体育视频。技术评估表明，Sporthesia在检测文本中可可视化实体方面达到了高精度（F1分数为0.9）。对八位体育分析师的专家评估显示，我们的语言驱动式创作方法具有较高的实用性、有效性和满意度，并为未来改进和机遇提供了见解。