Counterfactual explanations (CFs) provide human-interpretable insights into model's predictions by identifying minimal changes to input features that would alter the model's output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.
翻译:反事实解释通过识别输入特征的最小化变更来改变模型输出,从而为模型预测提供人类可解释的洞察。然而,现有方法难以生成同时满足以下条件的高质量多解解释:(1) 仅影响少量特征,(2) 适用于具有异构特征的表格数据,(3) 符合用户定义的约束。我们提出CounterFlowNet,这是一种生成式方法,利用条件生成流网络将反事实生成建模为序列化特征修改过程。该模型训练目标为按用户指定奖励函数比例采样反事实解释,该函数可编码关键反事实需求:有效性、稀疏性、邻近性与合理性,从而激励高质量解释生成。序列化建模机制产生高度稀疏的编辑操作,而统一动作空间可无缝支持连续型与分类型特征。此外,特征的可行动约束(如不可变性与单调性)可通过推理阶段的动作掩码机制强制执行,无需重新训练。在两种评估协议下对八个数据集的实验表明,CounterFlowNet在完全满足给定约束的前提下,实现了有效性、稀疏性、合理性与多样性之间的更优权衡。