Compatibility of Missing Data Handling Methods across the Stages of Producing Clinical Prediction Models

Missing data is a challenge when developing, validating and deploying clinical prediction models (CPMs). Traditionally, decisions concerning missing data handling during CPM development and validation havent accounted for whether missingness is allowed at deployment. We hypothesised that the missing data approach used during model development should optimise model performance upon deployment, whilst the approach used during model validation should yield unbiased predictive performance estimates upon deployment; we term this compatibility. We aimed to determine which combinations of missing data handling methods across the CPM life cycle are compatible. We considered scenarios where CPMs are intended to be deployed with missing data allowed or not, and we evaluated the impact of that choice on earlier modelling decisions. Through a simulation study and an empirical analysis of thoracic surgery data, we compared CPMs developed and validated using combinations of complete case analysis, mean imputation, single regression imputation, multiple imputation, and pattern sub-modelling. If planning to deploy a CPM without allowing missing data, then development and validation should use multiple imputation when required. Where missingness is allowed at deployment, the same imputation method must be used during development and validation. Commonly used combinations of missing data handling methods result in biased predictive performance estimates.

翻译：缺失数据是开发、验证和部署临床预测模型（CPMs）时面临的挑战。传统上，在CPM开发和验证阶段关于缺失数据处理的决策并未考虑部署阶段是否允许数据缺失。我们假设：模型开发阶段使用的缺失数据方法应优化部署时的模型性能，而模型验证阶段使用的方法应在部署时产生无偏的预测性能估计——我们将此定义为兼容性。本研究旨在确定CPM生命周期中哪些缺失数据处理方法的组合具有兼容性。我们考虑了CPMs部署时允许或不允许数据缺失的场景，并评估该选择对早期建模决策的影响。通过模拟研究和胸外科手术数据的实证分析，我们比较了使用完整案例分析、均值插补、单一回归插补、多重插补和模式子建模等不同组合方法开发与验证的CPMs。若计划部署不允许数据缺失的CPM，则开发和验证阶段在需要时应使用多重插补方法。若部署阶段允许数据缺失，则开发与验证阶段必须使用相同的插补方法。常用的缺失数据处理方法组合会导致预测性能估计产生偏差。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/