ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical Analysis

With the recent progress in sports analytics, deep learning approaches have demonstrated the effectiveness of mining insights into players' tactics for improving performance quality and fan engagement. This is attributed to the availability of public ground-truth datasets. While there are a few available datasets for turn-based sports for action detection, these datasets severely lack structured source data and stroke-level records since these require high-cost labeling efforts from domain experts and are hard to detect using automatic techniques. Consequently, the development of artificial intelligence approaches is significantly hindered when existing models are applied to more challenging structured turn-based sequences. In this paper, we present ShuttleSet, the largest publicly-available badminton singles dataset with annotated stroke-level records. It contains 104 sets, 3,685 rallies, and 36,492 strokes in 44 matches between 2018 and 2021 with 27 top-ranking men's singles and women's singles players. ShuttleSet is manually annotated with a computer-aided labeling tool to increase the labeling efficiency and effectiveness of selecting the shot type with a choice of 18 distinct classes, the corresponding hitting locations, and the locations of both players at each stroke. In the experiments, we provide multiple benchmarks (i.e., stroke influence, stroke forecasting, and movement forecasting) with baselines to illustrate the practicability of using ShuttleSet for turn-based analytics, which is expected to stimulate both academic and sports communities. Over the past two years, a visualization platform has been deployed to illustrate the variability of analysis cases from ShuttleSet for coaches to delve into players' tactical preferences with human-interactive interfaces, which was also used by national badminton teams during multiple international high-ranking matches.

翻译：随着体育分析领域的近期进展，深度学习方法已展现出挖掘运动员战术洞察以提升表现质量和观众参与度的有效性，这得益于公开真实数据集的可用性。尽管少数面向回合制运动的动作检测数据集已存在，但这些数据集严重缺乏结构化源数据和回合级记录，原因在于此类数据需领域专家投入高成本标注，且难以通过自动化技术检测。因此，当现有模型应用于更具挑战性的结构化回合制序列时，人工智能方法的发展受到显著阻碍。本文提出ShuttleSet，这是目前最大规模的公开羽毛球单打数据集，包含标注的回合级记录。该数据集涵盖2018年至2021年间44场比赛中的104局、3685个回合及36492次击球，涉及27名顶级男单与女单运动员。ShuttleSet通过计算机辅助标注工具进行人工标注，以提高标注效率与有效性，可选择18种不同类别的击球类型、对应击球位置及每次击球时双方运动员的位置。实验部分，我们提供了多个基准任务（即击球影响力、击球预测和移动预测）及其基线方法，以论证ShuttleSet用于回合制分析的可操作性，预期将推动学术界与体育界的共同发展。过去两年间，已部署可视化平台以展示ShuttleSet分析案例的多样性，使教练能通过人机交互界面深入探究运动员战术偏好，该平台亦被国家队在多场国际高水平赛事中使用。