Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
翻译:解释人工智能预测结果在人类作为最终决策者时至关重要,尤其在许多高风险应用中已不可或缺。本文提出两种基于自解释性图像分类器的新型架构,其首先通过利用查询图像与示例图像之间的视觉对应关系进行解释,而后进行预测(而非事后解释)。与ResNet-50及k近邻分类器相比,我们的模型在分布外数据集上表现持续提升(提升1至4个百分点),但在分布内测试中表现略差(降低1至2个百分点)。通过ImageNet和CUB数据集的大规模人类研究,我们发现基于对应关系的解释比kNN解释对用户更有帮助。在所有测试方法中,我们的解释能帮助用户更准确地拒绝AI的错误决策。有趣的是,我们首次证明在ImageNet和CUB图像分类任务中,可以实现互补性的人机协作准确率(即高于单独AI或单独人类的表现)。