Developing models that can answer questions of the form "How would $x$ change if $y$ had been $z$?" is fundamental for advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that corresponding labels are available in training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.
翻译:开发能够回答“如果$y$为$z$,那么$x$会如何变化”这类问题的模型,对于推动医学图像分析至关重要。然而,训练能够处理此类反事实问题的因果生成模型,目前要求所有相关变量已被观测到,且训练数据中需存在相应标签。但临床数据可能无法为所有患者提供完整记录,而现有最先进的因果生成模型也无法充分利用这些数据。为此,我们首次提出一种半监督深度因果生成模型,通过利用变量间的因果关系最大化所有可用数据的利用率。我们既探究了每个样本完全标注或完全未标注的场景,也研究了更具临床现实意义的每样本缺失不同标签的情况。我们利用因果推断技术来填补缺失值,进而为标签不完整的样本生成真实的反事实结果。