This study introduces an advanced machine learning method for predicting soccer players' market values, combining ensemble models and the Shapley Additive Explanations (SHAP) for interpretability. Utilizing data from about 12,000 players from Sofifa, the Boruta algorithm streamlined feature selection. The Gradient Boosting Decision Tree (GBDT) model excelled in predictive accuracy, with an R-squared of 0.901 and a Root Mean Squared Error (RMSE) of 3,221,632.175. Player attributes in skills, fitness, and cognitive areas significantly influenced market value. These insights aid sports industry stakeholders in player valuation. However, the study has limitations, like underestimating superstar players' values and needing larger datasets. Future research directions include enhancing the model's applicability and exploring value prediction in various contexts.
翻译:本研究提出了一种先进的机器学习方法用于预测足球运动员的市场价值,该方法结合了集成模型与沙普利加法解释(SHAP)以实现可解释性。利用Sofifa平台约12,000名球员的数据,通过Boruta算法优化了特征筛选过程。梯度提升决策树(GBDT)模型在预测准确性上表现优异,其决定系数(R平方)达到0.901,均方根误差(RMSE)为3,221,632.175。球员在技能、体能和认知领域的属性对市场价值产生显著影响。这些洞察有助于体育行业相关方进行球员价值评估。然而,本研究存在局限性,例如低估了超级球星的价值,且需要更大型的数据集支持。未来研究方向包括提升模型的适用性,并探索不同情境下的价值预测方法。