Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), explain task predictions (the "Why?"), and imagine alternative scenarios that could result in different predictions (the "What if?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and deepening human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our results show that CF-CBMs produce: accurate predictions (the "What?"), simple explanations for task predictions (the "Why?"), and interpretable counterfactuals (the "What if?"). CF-CBMs can also sample or estimate the most probable counterfactual to: (i) explain the effect of concept interventions on tasks, (ii) show users how to get a desired class label, and (iii) propose concept interventions via "task-driven" interventions.
翻译:当前深度学习模型并非设计用于同时解决三个基本问题:预测类别标签以解决给定分类任务(“是什么?”),解释任务预测(“为什么?”),以及想象可能导致不同预测的替代情景(“如果……会怎样?”)。无法回答这些问题在部署可靠AI智能体、校准人类信任及深化人机交互方面构成了关键鸿沟。为弥合这一鸿沟,我们提出反事实概念瓶颈模型(CF-CBMs)——一类无需运行事后搜索即可同时高效解答上述疑问的模型。我们的结果表明,CF-CBMs能生成:准确预测(“是什么?”)、对任务预测的简洁解释(“为什么?”)以及可解释的反事实(“如果……会怎样?”)。CF-CBMs还能采样或估计最可能的反事实,以:(i)解释概念干预对任务的影响,(ii)向用户展示如何获得期望的类别标签,以及(iii)通过“任务驱动”干预提出概念干预方案。