Digital twins hold substantial promise in many applications, but rigorous procedures for assessing their accuracy are essential for their widespread deployment in safety-critical settings. By formulating this task within the framework of causal inference, we show that attempts to certify the correctness of a twin using real-world observational data are unsound unless potentially tenuous assumptions are made about the data-generating process. To avoid these assumptions, we propose an assessment strategy that instead aims to find cases where the twin is not correct, and present a general-purpose statistical procedure for doing so that may be used across a wide variety of applications and twin models. Our approach yields reliable and actionable information about the twin under minimal assumptions about the twin and the real-world process of interest. We demonstrate the effectiveness of our methodology via a large-scale case study involving sepsis modelling within the Pulse Physiology Engine, which we assess using the MIMIC-III dataset of ICU patients.
翻译:数字孪生在众多应用中展现出巨大潜力,但在安全关键场景中广泛部署时,建立严格的准确性评估流程至关重要。通过将这一任务置于因果推断框架内,我们表明:除非对数据生成过程进行可能站不住脚的假设,否则利用真实世界观测数据来认证孪生模型正确性的尝试是不可靠的。为避免这些假设,我们提出一种旨在发现孪生模型错误案例的评估策略,并为此开发了一种通用统计方法,可广泛应用于各类应用场景与孪生模型。在最小化对孪生模型及其所模拟的真实世界过程的假设条件下,我们的方法能提供关于孪生模型的可靠且可操作的信息。我们通过一项大规模案例研究验证了该方法的有效性——该研究涉及Pulse生理引擎中的脓毒症建模,并使用MIMIC-III ICU患者数据集进行评估。