Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.
翻译:背景:尽管孟加拉国在降低五岁以下儿童死亡率方面取得了全国性进展,但地区间仍存在不均衡。基于DHS的预测模型可指导针对性随访,但前提是验证过程需反映未来实际应用场景。本研究探讨验证设计如何改变显性预测性能。方法:采用包含26个特征的数据处理流水线和三类模型框架,对四轮BDHS数据(2011-2022年;33,962名儿童;1,290例死亡)进行分析,实施四种验证方案(包括跨调查时间验证:以2011+2014年数据训练、2017年数据校准、2022年数据测试)。通过遗传算法神经架构搜索选定含32个单元ELU激活函数的多层感知机。AUROC采用2000次bootstrap重抽样计算;筛查效用评估使用固定能力水平下的灵敏度、阳性预测值和需筛人数。结果:验证方案对公共卫生解释的影响大于模型类别。NAS MLP的AUROC范围从0.669(仅2022年随机抽样)到0.775(混合随机抽样),时间验证AUROC为0.730。在时间验证的前10%阈值下,NAS模型识别出2022年355例死亡中的152例(灵敏度42.8%,阳性预测值13.2%,NNS 7.6)。不同设计方案的NNS范围为5.6至11.0。结论:验证方案的选择对筛查工作量和显性政策价值的影响大于架构选择。时间验证可支撑对随访和转诊需求的合理估算;DHS儿童死亡率研究在项目应用前应报告灵敏度、阳性预测值和需筛人数。