Developing models that are capable of answering questions of the form "How would x change if y had been z?'" is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.
翻译:开发能够回答“如果y曾是z,x将如何变化?”这类问题的模型,对于推进医学图像分析至关重要。然而,训练能够处理此类反事实问题的因果生成模型,目前要求所有相关变量在训练数据中均已观测到,且相应的标签可用。然而,临床数据可能并非所有患者都有完整记录,而现有的先进因果生成模型无法充分利用这一点。因此,我们首次开发了一种半监督深度因果生成模型,该模型利用变量间的因果关系,以最大化利用所有可用数据。我们在每个样本要么完全标记、要么完全未标记的场景下探索了该方法,同时也研究了更具临床现实性的情况,即每个样本缺失不同的标签。我们利用因果推断技术来推断缺失值,并随后生成现实的反事实,即使对于标签不完整的样本也是如此。