Cricket, "a Gentleman's Game", is a prominent sport rising worldwide. Due to the rising competitiveness of the sport, players and team management have become more professional with their approach. Prior studies predicted individual performance or chose the best team but did not highlight the batter's potential. On the other hand, our research aims to evaluate a player's impact while considering his control in various circumstances. This paper seeks to understand the conundrum behind this impactful performance by determining how much control a player has over the circumstances and generating the "Effective Runs",a new measure we propose. We first gathered the fundamental cricket data from open-source datasets; however, variables like pitch, weather, and control were not readily available for all matches. As a result, we compiled our corpus data by analyzing the commentary of the match summaries. This gave us an insight into the particular game's weather and pitch conditions. Furthermore, ball-by-ball inspection from the commentary led us to determine the control of the shots played by the batter. We collected data for the entire One Day International career, up to February 2022, of 3 prominent cricket players: Rohit G Sharma, David A Warner, and Kane S Williamson. Lastly, to prepare the dataset, we encoded, scaled, and split the dataset to train and test Machine Learning Algorithms. We used Multiple Linear Regression (MLR), Polynomial Regression, Support Vector Regression (SVR), Decision Tree Regression, and Random Forest Regression on each player's data individually to train them and predict the Impact the player will have on the game. Multiple Linear Regression and Random Forest give the best predictions accuracy of 90.16 percent and 87.12 percent, respectively.
翻译:板球作为“绅士运动”,正日益成为全球瞩目的体育项目。随着赛事竞争的白热化,球员与团队管理层在策略上愈发专业化。既往研究多聚焦于预测个人表现或选拔最优团队,却未能凸显击球员的潜在影响力。而本研究旨在评估球员在不同情境下对比赛的掌控能力与影响力。本文通过揭示球员对赛场局势的控制程度,提出全新指标——“有效得分”(Effective Runs),旨在解析这种影响力背后的深层逻辑。我们首先从开源数据集中采集基础板球数据,但诸如球场条件、天气及控制力等变量并未涵盖所有比赛。为此,我们通过分析比赛总结解说词,自主构建语料库,从而获取特定赛事的天气与球场条件信息。此外,逐球解析解说词内容,可确定击球员击球时的控制力指标。我们收集了截至2022年2月三位知名板球运动员——罗希特·G·夏尔马、大卫·A·华纳与凯恩·S·威廉姆森的整个一日国际职业生涯数据。最终,通过对数据集进行编码、标准化与分割,训练并测试机器学习算法:针对每位球员数据分别应用多元线性回归、多项式回归、支持向量回归、决策树回归及随机森林回归,以预测其对比赛的影响力。其中多元线性回归与随机森林模型预测精度最高,分别达到90.16%与87.12%。