With the increasing impact of algorithmic decision-making on human lives, the interpretability of models has become a critical issue in machine learning. Counterfactual explanation is an important method in the field of interpretable machine learning, which can not only help users understand why machine learning models make specific decisions, but also help users understand how to change these decisions. Naturally, it is an important task to study the robustness of counterfactual explanation generation algorithms to model changes. Previous literature has proposed the concept of Naturally-Occurring Model Change, which has given us a deeper understanding of robustness to model change. In this paper, we first further generalize the concept of Naturally-Occurring Model Change, proposing a more general concept of model parameter changes, Generally-Occurring Model Change, which has a wider range of applicability. We also prove the corresponding probabilistic guarantees. In addition, we consider a more specific problem, data set perturbation, and give relevant theoretical results by combining optimization theory.
翻译:随着算法决策对人类生活影响的日益加深,模型可解释性已成为机器学习领域的关键问题。反事实解释是可解释机器学习领域的重要方法,其不仅能帮助用户理解机器学习模型为何做出特定决策,还能帮助用户理解如何改变这些决策。因此,研究反事实解释生成算法对模型变化的鲁棒性成为一项重要任务。已有文献提出了自然发生模型变化的概念,深化了我们对模型变化鲁棒性的理解。本文首先进一步推广自然发生模型变化的概念,提出更具普适性的模型参数变化概念——普遍发生模型变化,该概念具有更广泛的适用性。我们同时证明了相应的概率保证。此外,我们考虑了一个更具体的问题——数据集扰动,并结合作化理论给出了相关的理论结果。