Counterfactual (CF) explanations, also known as contrastive explanations and algorithmic recourses, are popular for explaining machine learning models in high-stakes domains. For a subject that receives a negative model prediction (e.g., mortgage application denial), the CF explanations are similar instances but with positive predictions, which informs the subject of ways to improve. While their various properties have been studied, such as validity and stability, we contribute a novel one: their behaviors under iterative partial fulfillment (IPF). Specifically, upon receiving a CF explanation, the subject may only partially fulfill it before requesting a new prediction with a new explanation, and repeat until the prediction is positive. Such partial fulfillment could be due to the subject's limited capability (e.g., can only pay down two out of four credit card accounts at this moment) or an attempt to take the chance (e.g., betting that a monthly salary increase of $800 is enough even though $1,000 is recommended). Does such iterative partial fulfillment increase or decrease the total cost of improvement incurred by the subject? We mathematically formalize IPF and demonstrate, both theoretically and empirically, that different CF algorithms exhibit vastly different behaviors under IPF. We discuss implications of our observations, advocate for this factor to be carefully considered in the development and study of CF algorithms, and give several directions for future work.
翻译:反事实解释,又称对比解释和算法建议,在高风险领域中广泛用于解释机器学习模型。当主体收到负面模型预测(例如,抵押贷款申请被拒)时,反事实解释提供相似的实例但其预测结果为正面,从而告知主体改进的路径。尽管已有研究探讨了反事实解释的多种属性(如有效性和稳定性),我们提出了一种新颖的视角:其在迭代部分满足下的行为。具体而言,主体在收到反事实解释后,可能仅部分满足该解释,随后请求新的预测及对应的新解释,并重复此过程直至预测变为正面。这种部分满足可能源于主体能力有限(例如,当前只能偿还四张信用卡中的两张欠款),或出于冒险尝试(例如,认为即使推荐月薪增加1000美元,仅增加800美元也可能足够)。迭代部分满足究竟是增加还是减少主体所需的总改进成本?我们从数学上形式化定义了迭代部分满足,并通过理论和实证分析证明,不同反事实算法在迭代部分满足下展现出截然不同的行为。我们讨论了这些发现的启示,倡导在开发和研究反事实算法时审慎考虑这一因素,并提出了若干未来研究方向。