The growing integration of machine learning (ML) and artificial intelligence (AI) models into high-stakes domains such as healthcare and scientific research calls for models that are not only accurate but also interpretable. Among the existing explainable methods, counterfactual explanations offer interpretability by identifying minimal changes to inputs that would alter a model's prediction, thus providing deeper insights. However, current counterfactual generation methods suffer from critical limitations, including gradient vanishing, discontinuous latent spaces, and an overreliance on the alignment between learned and true decision boundaries. To overcome these limitations, we propose LeapFactual, a novel counterfactual explanation algorithm based on conditional flow matching. LeapFactual generates reliable and informative counterfactuals, even when true and learned decision boundaries diverge. Following a model-agnostic approach, LeapFactual is not limited to models with differentiable loss functions. It can even handle human-in-the-loop systems, expanding the scope of counterfactual explanations to domains that require the participation of human annotators, such as citizen science. We provide extensive experiments on benchmark and real-world datasets showing that LeapFactual generates accurate and in-distribution counterfactual explanations that offer actionable insights. We observe, for instance, that our reliable counterfactual samples with labels aligning to ground truth can be beneficially used as new training data to enhance the model. The proposed method is broadly applicable and enhances both scientific knowledge discovery and non-expert interpretability.
翻译:随着机器学习(ML)和人工智能(AI)模型日益融入医疗保健和科学研究等高风险领域,对模型的要求不仅限于准确性,还需具备可解释性。在现有的可解释方法中,反事实解释通过识别能够改变模型预测的最小输入变化来提供可解释性,从而提供更深入的见解。然而,当前的反事实生成方法存在关键局限性,包括梯度消失、潜在空间不连续,以及对学习决策边界与真实决策边界之间对齐的过度依赖。为克服这些局限性,我们提出了LeapFactual,一种基于条件流匹配的新型反事实解释算法。即使真实决策边界与学习决策边界存在偏差,LeapFactual仍能生成可靠且信息丰富的反事实。遵循模型无关的方法,LeapFactual不限于具有可微分损失函数的模型。它甚至可以处理人在回路系统,从而将反事实解释的范围扩展到需要人类标注者参与的领域,例如公民科学。我们在基准数据集和真实世界数据集上进行了大量实验,结果表明LeapFactual能够生成准确且符合分布的反事实解释,并提供可操作的见解。例如,我们观察到,那些标签与真实情况一致且可靠的反事实样本,可以作为新的训练数据有效提升模型性能。所提出的方法具有广泛的适用性,既能促进科学知识发现,也能增强非专业人士的可解释性。