Digital twins are virtual systems designed to predict how a real-world process will evolve in response to interventions. This modelling paradigm holds substantial promise in many applications, but rigorous procedures for assessing their accuracy are essential for safety-critical settings. We consider how to assess the accuracy of a digital twin using real-world data. We formulate this as causal inference problem, which leads to a precise definition of what it means for a twin to be "correct" appropriate for many applications. Unfortunately, fundamental results from causal inference mean observational data cannot be used to certify that a twin is correct in this sense unless potentially tenuous assumptions are made, such as that the data are unconfounded. To avoid these assumptions, we propose instead to find situations in which the twin is not correct, and present a general-purpose statistical procedure for doing so. Our approach yields reliable and actionable information about the twin under only the assumption of an i.i.d. dataset of observational trajectories, and remains sound even if the data are confounded. We apply our methodology to a large-scale, real-world case study involving sepsis modelling within the Pulse Physiology Engine, which we assess using the MIMIC-III dataset of ICU patients.
翻译:数字孪生是旨在预测真实世界过程在干预下如何演化的虚拟系统。这一建模范式在众多应用领域展现出巨大潜力,但在安全关键场景中,严格的精度评估程序至关重要。我们探讨如何利用真实世界数据评估数字孪生的准确性,将其形式化为因果推断问题,从而精确定义了孪生系统在诸多应用中“正确”的含义。然而,因果推断的基本结论表明:除非做出潜在不可靠的假设(如数据无混杂),否则无法基于观测数据验证孪生系统在此意义上的正确性。为避免此类假设,我们转而寻找孪生系统不正确的场景,并提出一套通用的统计程序。该方法仅需假设观测轨迹数据集满足独立同分布,即可提供关于孪生系统的可靠且可操作信息,即便数据存在混杂仍保持有效性。我们将该方法应用于基于Pulse生理引擎的脓毒症建模大规模真实世界案例研究,并使用MIMIC-III重症监护患者数据集进行评估。