Explainable AI (XAI) has gained significant attention for providing insights into the decision-making processes of deep learning models, particularly for image classification tasks through visual explanations visualized by saliency maps. Despite their success, challenges remain due to the lack of annotated datasets and standardized evaluation pipelines. In this paper, we introduce Saliency-Bench, a novel benchmark suite designed to evaluate visual explanations generated by saliency methods across multiple datasets. We curated, constructed, and annotated eight datasets, each covering diverse tasks such as scene classification, cancer diagnosis, object classification, and action classification, with corresponding ground-truth explanations. The benchmark includes a standardized and unified evaluation pipeline for assessing faithfulness and alignment of the visual explanation, providing a holistic visual explanation performance assessment. We benchmark these eight datasets with widely used saliency methods on different image classifier architectures to evaluate explanation quality. Additionally, we developed an easy-to-use API for automating the evaluation pipeline, from data accessing, and data loading, to result evaluation. The benchmark is available via our website: https://xaidataset.github.io.
翻译:可解释人工智能(XAI)因其能够为深度学习模型的决策过程提供洞见而受到广泛关注,特别是在图像分类任务中,通过显著图可视化的视觉解释。尽管取得了成功,但由于缺乏标注数据集和标准化评估流程,挑战依然存在。本文介绍了Saliency-Bench,这是一个新颖的基准套件,旨在评估多种数据集上由显著方法生成的视觉解释。我们整理、构建并标注了八个数据集,每个数据集涵盖场景分类、癌症诊断、物体分类和动作分类等多样化任务,并配有相应的真实解释。该基准包含一个标准化、统一的评估流程,用于评估视觉解释的忠实度和对齐性,从而提供全面的视觉解释性能评估。我们在不同的图像分类器架构上,使用广泛采用的显著方法对这八个数据集进行了基准测试,以评估解释质量。此外,我们开发了一个易于使用的API,用于自动化评估流程,涵盖从数据访问、数据加载到结果评估的全过程。该基准可通过我们的网站获取:https://xaidataset.github.io。