We introduce the most comprehensive publicly available datasets for mixed doubles curling, constructed from eleven top-level tournaments from the CurlIT (https://curlit.com/results) Results Booklets spanning 53 countries, 1,112 games, and nearly 70,000 recorded shots. While curling analytics has grown in recent years, mixed doubles remains under-served due to limited access to data. Using a combined text-scraping and image-processing pipeline, we extract and standardize detailed game- and shot-level information, including player statistics, hammer possession, Power Play usage, stone coordinates, and post-shot scoring states. We describe the data engineering workflow, highlight challenges in parsing historical records, and derive additional contextual features that enable rigorous strategic analysis. Using these datasets, we present initial insights into shot selection and success rates, scoring distributions, and team efficiencies, illustrating key differences between mixed doubles and traditional 4-player curling. We highlight various ways to analyze this type of data including from a shot-, end-, game- or team-level to display its versatilely. The resulting resources provide a foundation for advanced performance modeling, strategic evaluation, and future research in mixed doubles curling analytics, supporting broader analytical engagement with this rapidly growing discipline.
翻译:我们推出了目前最全面的公开混合双人冰壶数据集,该数据集基于CurlIT(https://curlit.com/results)成绩册中涵盖53个国家、1,112场比赛、近70,000次投壶记录的十一项顶级赛事构建而成。尽管近年来冰壶数据分析领域不断发展,但由于数据获取受限,混合双人项目的研究仍显不足。通过结合文本抓取与图像处理的流程,我们提取并标准化了详细的比赛级与投壶级信息,包括运动员统计数据、后手权归属、强力局使用情况、冰壶坐标以及投壶后的得分状态。本文描述了数据工程的工作流程,重点阐述了历史记录解析中的挑战,并推导出支持严谨战略分析的附加情境特征。基于这些数据集,我们对投壶选择与成功率、得分分布及团队效率进行了初步分析,揭示了混合双人冰壶与传统四人冰壶的关键差异。我们展示了从投壶、局次、比赛到团队层级分析此类数据的多种方法,以体现其多维度应用潜力。最终形成的资源为混合双人冰壶数据分析中的高级表现建模、战略评估及未来研究奠定了基础,有助于推动这一快速发展领域的更广泛分析参与。