Counterfactual image generation is pivotal for understanding the causal relations of variables, with applications in interpretability and generation of unbiased synthetic data. However, evaluating image generation is a long-standing challenge in itself. The need to evaluate counterfactual generation compounds on this challenge, precisely because counterfactuals, by definition, are hypothetical scenarios without observable ground truths. In this paper, we present a novel comprehensive framework aimed at benchmarking counterfactual image generation methods. We incorporate metrics that focus on evaluating diverse aspects of counterfactuals, such as composition, effectiveness, minimality of interventions, and image realism. We assess the performance of three distinct conditional image generation model types, based on the Structural Causal Model paradigm. Our work is accompanied by a user-friendly Python package which allows to further evaluate and benchmark existing and future counterfactual image generation methods. Our framework is extendable to additional SCM and other causal methods, generative models, and datasets.
翻译:反事实图像生成对于理解变量间的因果关系至关重要,在可解释性及无偏合成数据生成中具有广泛应用。然而,图像生成评估本身就是一个长期存在的挑战。反事实生成评估进一步加剧了这一难题,其根本原因在于反事实定义本身即为不存在可观测真值的假设场景。本文提出了一种全新的综合性基准测试框架,专门用于评估反事实图像生成方法。我们整合了聚焦于反事实多重维度的评估指标,包括构成性、有效性、干预最小性及图像真实性。基于结构因果模型范式,我们对三类条件图像生成模型的性能进行了评估。本研究成果附带一个用户友好的Python工具包,可进一步评估和基准测试现有及未来的反事实图像生成方法。本框架可扩展至其他SCM方法、因果方法、生成模型及数据集。