Counterfactual Explanations (CFEs) interpret machine learning models by identifying the smallest change to input features needed to change the model's prediction to a desired output. For classification tasks, CFEs determine how close a given sample is to the decision boundary of a trained classifier. Existing methods are often sample-inefficient, requiring numerous evaluations of a black-box model -- an approach that is both costly and impractical when access to the model is limited. We propose Adaptive sampling for Counterfactual Explanations (ACE), a sample-efficient algorithm combining Bayesian estimation and stochastic optimization to approximate the decision boundary with fewer queries. By prioritizing informative points, ACE minimizes evaluations while generating accurate and feasible CFEs. Extensive empirical results show that ACE achieves superior evaluation efficiency compared to state-of-the-art methods, while maintaining effectiveness in identifying minimal and actionable changes.
翻译:反事实解释(CFEs)通过识别输入特征所需的最小变化来改变模型的预测至期望输出,从而解释机器学习模型。对于分类任务,CFEs用于衡量给定样本与训练分类器决策边界的接近程度。现有方法通常采样效率低下,需要对黑盒模型进行大量评估——当模型访问受限时,这种方法既昂贵又不切实际。我们提出面向反事实解释的自适应采样方法(ACE),这是一种结合贝叶斯估计与随机优化的采样高效算法,能够以更少的查询次数近似决策边界。通过优先选择信息量大的采样点,ACE在生成准确且可行的CFEs的同时最小化评估次数。大量实验结果表明,与现有先进方法相比,ACE在保持识别最小化可执行变化之有效性的同时,实现了更优的评估效率。