Latent-space optimization methods for counterfactual explanations - framed as minimal semantic perturbations that change model predictions - inherit the ambiguity of Wachter et al.'s objective: the choice of distance metric dictates whether perturbations are meaningful or adversarial. Existing approaches adopt flat or misaligned geometries, leading to off-manifold artifacts, semantic drift, or adversarial collapse. We introduce Perceptual Counterfactual Geodesics (PCG), a method that constructs counterfactuals by tracing geodesics under a perceptually Riemannian metric induced from robust vision features. This geometry aligns with human perception and penalizes brittle directions, enabling smooth, on-manifold, semantically valid transitions. Experiments on three vision datasets show that PCG outperforms baselines and reveals failure modes hidden under standard metrics.
翻译:潜在空间优化方法用于反事实解释——即定义为改变模型预测的最小语义扰动——继承了Wachter等人目标函数的模糊性:距离度量的选择决定了扰动是具有语义意义还是对抗性。现有方法采用平坦或未对齐的几何结构,导致脱离流形的伪影、语义漂移或对抗性坍缩。我们提出感知反事实测地线(PCG),该方法通过沿鲁棒视觉特征诱导的感知黎曼度量下的测地线轨迹构建反事实。该几何结构与人类感知对齐,并惩罚脆弱方向,从而实现平滑、在流形上且语义有效的过渡。在三个视觉数据集上的实验表明,PCG优于基线方法,并揭示了标准度量下隐藏的失效模式。