Saliency-Bench：一个用于评估视觉解释的综合基准 (Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations)

Explainable AI (XAI) has gained significant attention for providing insights into the decision-making processes of deep learning models, particularly for image classification tasks through visual explanations visualized by saliency maps. Despite their success, challenges remain due to the lack of annotated datasets and standardized evaluation pipelines. In this paper, we introduce Saliency-Bench, a novel benchmark suite designed to evaluate visual explanations generated by saliency methods across multiple datasets. We curated, constructed, and annotated eight datasets, each covering diverse tasks such as scene classification, cancer diagnosis, object classification, and action classification, with corresponding ground-truth explanations. The benchmark includes a standardized and unified evaluation pipeline for assessing faithfulness and alignment of the visual explanation, providing a holistic visual explanation performance assessment. We benchmark these eight datasets with widely used saliency methods on different image classifier architectures to evaluate explanation quality. Additionally, we developed an easy-to-use API for automating the evaluation pipeline, from data accessing, and data loading, to result evaluation. The benchmark is available via our website: https://xaidataset.github.io.

翻译：可解释人工智能（XAI）因其能够为深度学习模型的决策过程提供洞见而受到广泛关注，特别是在图像分类任务中，通过显著图可视化的视觉解释。尽管取得了成功，但由于缺乏标注数据集和标准化评估流程，挑战依然存在。本文介绍了Saliency-Bench，这是一个新颖的基准套件，旨在评估多种数据集上由显著方法生成的视觉解释。我们整理、构建并标注了八个数据集，每个数据集涵盖场景分类、癌症诊断、物体分类和动作分类等多样化任务，并配有相应的真实解释。该基准包含一个标准化、统一的评估流程，用于评估视觉解释的忠实度和对齐性，从而提供全面的视觉解释性能评估。我们在不同的图像分类器架构上，使用广泛采用的显著方法对这八个数据集进行了基准测试，以评估解释质量。此外，我们开发了一个易于使用的API，用于自动化评估流程，涵盖从数据访问、数据加载到结果评估的全过程。该基准可通过我们的网站获取：https://xaidataset.github.io。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日