The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.
翻译:用于儿童死亡率预测的机器学习模型在应用于未来人群时往往不够准确,因为它们因交叉验证中使用的随机化而遭受前瞻性偏差。本文使用了2011年至2022年孟加拉国人口与健康调查数据,样本量n = 33,962。我们在(2011-2014年)数据上训练模型,在2017年数据上进行验证,并在2022年数据上进行测试。在模型首次测试八年后,基于遗传算法的神经架构搜索发现单层神经网络架构(具有64个单元)优于XGBoost(AUROC = 0.76 对比 0.73;p < 0.01)。此外,通过详细的公平性审计,我们识别出一个整体的"社会经济预测梯度",即地区贫困水平与算法AUC呈负相关(r = -0.62)。我们还发现,模型在最不富裕的行政区表现最佳(AUC 0.74),而在最富裕的行政区表现显著下降(AUC 0.66)。这些发现表明,该模型正在识别最需要干预的地区。当以10%的筛查水平并使用SHAP值和普拉特校准进行验证时,与梯度提升模型相比,我们的模型每年可额外识别约1300名高危儿童,从而为有针对性的妇幼健康干预提供了一个稳健、可用于生产环境的计算表型。