Providing explanations about how machine learning algorithms work and/or make particular predictions is one of the main tools that can be used to improve their trusworthiness, fairness and robustness. Among the most intuitive type of explanations are counterfactuals, which are examples that differ from a given point only in the prediction target and some set of features, presenting which features need to be changed in the original example to flip the prediction for that example. However, such counterfactuals can have many different features than the original example, making their interpretation difficult. In this paper, we propose to explicitly add a cardinality constraint to counterfactual generation limiting how many features can be different from the original example, thus providing more interpretable and easily understantable counterfactuals.
翻译:提供关于机器学习算法如何工作或做出特定预测的解释,是提高其可信度、公平性和鲁棒性的主要工具之一。最直观的解释类型之一是反事实,即与给定样本仅在预测目标和某些特征上存在差异的示例,通过展示原始样本中需要改变哪些特征才能翻转该样本的预测结果。然而,这类反事实可能与原始样本在诸多特征上存在差异,从而增加了解释难度。本文提出在反事实生成过程中显式添加基数约束,限制与原始样本不同的特征数量,从而生成更具可解释性和易于理解的反事实。