Counterfactual explanations (CFEs) guide users on how to adjust inputs to machine learning models to achieve desired outputs. While existing research primarily addresses static scenarios, real-world applications often involve data or model changes, potentially invalidating previously generated CFEs and rendering user-induced input changes ineffective. Current methods addressing this issue often support only specific models or change types, require extensive hyperparameter tuning, or fail to provide probabilistic guarantees on CFE robustness to model changes. This paper proposes a novel approach for generating CFEs that provides probabilistic guarantees for any model and change type, while offering interpretable and easy-to-select hyperparameters. We establish a theoretical framework for probabilistically defining robustness to model change and demonstrate how our BetaRCE method directly stems from it. BetaRCE is a post-hoc method applied alongside a chosen base CFE generation method to enhance the quality of the explanation beyond robustness. It facilitates a transition from the base explanation to a more robust one with user-adjusted probability bounds. Through experimental comparisons with baselines, we show that BetaRCE yields robust, most plausible, and closest to baseline counterfactual explanations.
翻译:反事实解释(CFEs)指导用户如何调整机器学习模型的输入以获得期望的输出。现有研究主要针对静态场景,然而实际应用常涉及数据或模型的变化,这可能使先前生成的CFEs失效,导致用户实施的输入调整无效。当前解决这一问题的方法通常仅支持特定模型或变化类型,需要大量超参数调优,或无法提供CFEs对模型变化鲁棒性的概率保证。本文提出一种生成CFEs的新方法,该方法可为任意模型和变化类型提供概率保证,同时提供可解释且易于选择的超参数。我们建立了从概率角度定义模型变化鲁棒性的理论框架,并证明我们的BetaRCE方法直接源于该框架。BetaRCE是一种事后处理方法,可与选定的基础CFE生成方法结合使用,在鲁棒性之外提升解释质量。该方法支持在用户调整概率边界的情况下,将基础解释转化为更具鲁棒性的解释。通过与基线方法的实验比较,我们证明BetaRCE能产生鲁棒性更强、最合理且最接近基线水平的反事实解释。