Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

翻译：时间验证改变了孟加拉国五岁以下儿童死亡率预测的显性公共卫生效用：一项基于四轮DHS的机器学习研究

Md Muhtasim Munif Fahim,M. Monimul Huq,M. Sabiruzzaman,Md Rezaul Karim

from arxiv, 26 pages, 6 figures. Submitted to BMC Medical Informatics

Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.

翻译：背景：尽管孟加拉国在降低五岁以下儿童死亡率方面取得了全国性进展，但地区间仍存在不均衡。基于DHS的预测模型可指导针对性随访，但前提是验证过程需反映未来实际应用场景。本研究探讨验证设计如何改变显性预测性能。方法：采用包含26个特征的数据处理流水线和三类模型框架，对四轮BDHS数据（2011-2022年；33,962名儿童；1,290例死亡）进行分析，实施四种验证方案（包括跨调查时间验证：以2011+2014年数据训练、2017年数据校准、2022年数据测试）。通过遗传算法神经架构搜索选定含32个单元ELU激活函数的多层感知机。AUROC采用2000次bootstrap重抽样计算；筛查效用评估使用固定能力水平下的灵敏度、阳性预测值和需筛人数。结果：验证方案对公共卫生解释的影响大于模型类别。NAS MLP的AUROC范围从0.669（仅2022年随机抽样）到0.775（混合随机抽样），时间验证AUROC为0.730。在时间验证的前10%阈值下，NAS模型识别出2022年355例死亡中的152例（灵敏度42.8%，阳性预测值13.2%，NNS 7.6）。不同设计方案的NNS范围为5.6至11.0。结论：验证方案的选择对筛查工作量和显性政策价值的影响大于架构选择。时间验证可支撑对随访和转诊需求的合理估算；DHS儿童死亡率研究在项目应用前应报告灵敏度、阳性预测值和需筛人数。