Time-to-Injury Forecasting in Elite Female Football: A DeepHit Survival Approach

Injury occurrence in football poses significant challenges for athletes and teams, carrying personal, competitive, and financial consequences. While machine learning has been applied to injury prediction before, existing approaches often rely on static pre-season data and binary outcomes, limiting their real-world utility. This study investigates the feasibility of using a DeepHit neural network to forecast time-to-injury from longitudinal athlete monitoring data, while providing interpretable predictions. The analysis utilised the publicly available SoccerMon dataset, containing two seasons of training, match, and wellness records from elite female footballers. Data was pre-processed through cleaning, feature engineering, and the application of three imputation strategies. Baseline models (Random Forest, XGBoost, Logistic Regression) were optimised via grid search for benchmarking, while the DeepHit model, implemented with a multilayer perceptron backbone, was evaluated using chronological and leave-one-player-out (LOPO) validation. DeepHit achieved a concordance index of 0.762, outperforming baseline models and delivering individualised, time-varying risk estimates. Shapley Additive Explanations (SHAP) identified clinically relevant predictors consistent with established risk factors, enhancing interpretability. Overall, this study provides a novel proof of concept: survival modelling with DeepHit shows strong potential to advance injury forecasting in football, offering accurate, explainable, and actionable insights for injury prevention across competitive levels.

翻译：足球运动中的损伤发生对运动员和团队构成重大挑战，带来个人、竞技及财务层面的多重影响。尽管机器学习此前已应用于损伤预测，但现有方法通常依赖于静态的季前数据与二元结果，限制了其实际应用价值。本研究探讨了利用DeepHit神经网络从纵向运动员监测数据中预测损伤发生时间的可行性，同时提供可解释的预测结果。分析采用公开的SoccerMon数据集，该数据集包含精英女子足球运动员两个赛季的训练、比赛及健康状态记录。数据通过清洗、特征工程及三种插补策略进行预处理。基准模型（随机森林、XGBoost、逻辑回归）通过网格搜索优化进行性能对比，而以前馈神经网络为架构的DeepHit模型则采用时间序列验证与留一球员交叉验证（LOPO）进行评估。DeepHit模型取得0.762的一致性指数，优于基准模型，并能提供个体化、时变的风险评估。沙普利加性解释（SHAP）识别出与既定风险因素一致的临床相关预测变量，增强了模型可解释性。总体而言，本研究提出了创新性的概念验证：基于DeepHit的生存分析模型在足球损伤预测领域展现出巨大潜力，能够为不同竞技水平的损伤预防提供精准、可解释且具可操作性的见解。