Machine learning models that automate decision-making are increasingly used in consequential areas such as loan approvals, pretrial bail approval, and hiring. Unfortunately, most of these models are black boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might also desire explanations to understand why a decision was made. Ethical and legal considerations require informing the individual of changes in the input attribute (s) that could be made to produce a desirable outcome. Our work focuses on the latter problem of generating counterfactual explanations by considering the causal dependencies between features. In this paper, we present the framework CFGs, CounterFactual Generation with s(CASP), which utilizes the goal-directed Answer Set Programming (ASP) system s(CASP) to automatically generate counterfactual explanations from models generated by rule-based machine learning algorithms in particular. We benchmark CFGs with the FOLD-SE model. Reaching the counterfactual state from the initial state is planned and achieved using a series of interventions. To validate our proposal, we show how counterfactual explanations are computed and justified by imagining worlds where some or all factual assumptions are altered/changed. More importantly, we show how CFGs navigates between these worlds, namely, go from our initial state where we obtain an undesired outcome to the imagined goal state where we obtain the desired decision, taking into account the causal relationships among features.
翻译:在贷款审批、审前保释批准和招聘等关键领域,自动化决策的机器学习模型正日益普及。然而,这些模型大多属于黑箱系统,即无法揭示其预测决策的具体依据。透明性需求要求对此类预测提供合理解释,受影响的个体也可能需要了解决策背后的原因。伦理与法律考量则要求告知个体应如何调整输入属性以获得期望的结果。本研究聚焦于后一问题,即通过考虑特征间的因果依赖关系生成反事实解释。本文提出CFGs框架(基于s(CASP)的反事实生成),该框架利用目标导向的答案集编程系统s(CASP),特别针对基于规则的机器学习算法生成的模型,自动生成反事实解释。我们使用FOLD-SE模型对CFGs进行基准测试。从初始状态到达反事实状态的路径通过一系列干预措施进行规划与实现。为验证本方案,我们展示了如何通过设想部分或全部事实假设被改变的世界来计算和论证反事实解释。更重要的是,我们阐明了CFGs如何在考虑特征间因果关系的前提下,在这些可能世界间进行导航——即从获得非期望结果的初始状态,跨越至获得期望决策的想象目标状态。