Counterfactual explanations (CE) aim to reveal how small input changes flip a model's prediction, yet many methods modify more features than necessary, reducing clarity and actionability. We introduce \emph{COLA}, a model- and generator-agnostic post-hoc framework that refines any given CE by computing a coupling via optimal transport (OT) between factual and counterfactual sets and using it to drive a Shapley-based attribution (\emph{$p$-SHAP}) that selects a minimal set of edits while preserving the target effect. Theoretically, OT minimizes an upper bound on the $W_1$ divergence between factual and counterfactual outcomes and that, under mild conditions, refined counterfactuals are guaranteed not to move farther from the factuals than the originals. Empirically, across four datasets, twelve models, and five CE generators, COLA achieves the same target effects with only 26--45\% of the original feature edits. On a small-scale benchmark, COLA shows near-optimality.
翻译:反事实解释旨在揭示微小输入变化如何改变模型预测,但现有方法常修改过多特征,降低了解释的清晰度与可操作性。本文提出\textbf{COLA}框架——一种与模型和生成器无关的后处理方法,通过计算事实集与反事实集之间的最优传输耦合,并利用该耦合驱动基于Shapley值的归因方法(\emph{$p$-SHAP}),从而在保持目标效果的前提下筛选出最小编辑特征集。理论上,最优传输最小化了事实与反事实结果间$W_1$散度的上界,且在温和条件下可保证优化后的反事实样本不会比原始样本更偏离事实分布。在四个数据集、十二种模型和五种反事实生成器的实验中,COLA仅需修改原始特征编辑量的26\%--45\%即可实现同等目标效果。在小规模基准测试中,COLA表现出接近最优的性能。