On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification

from arxiv, The code of this work will be publicly available at https://git.tu-berlin.de/rsim/xai4rs Accepted at IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

The development of explainable artificial intelligence (xAI) methods for scene classification problems has attracted great attention in remote sensing (RS). Most xAI methods and the related evaluation metrics in RS are initially developed for natural images considered in computer vision (CV), and their direct usage in RS may not be suitable. To address this issue, in this paper, we investigate the effectiveness of explanation methods and metrics in the context of RS image scene classification. In detail, we methodologically and experimentally analyze ten explanation metrics spanning five categories (faithfulness, robustness, localization, complexity, randomization), applied to five established feature attribution methods (Occlusion, LIME, GradCAM, LRP, and DeepLIFT) across three RS datasets. Our methodological analysis identifies key limitations in both explanation methods and metrics. The performance of perturbation-based methods, such as Occlusion and LIME, heavily depends on perturbation baselines and spatial characteristics of RS scenes. Gradient-based approaches like GradCAM struggle when multiple labels are present in the same image, while some relevance propagation methods (LRP) can distribute relevance disproportionately relative to the spatial extent of classes. Analogously, we find limitations in evaluation metrics. Faithfulness metrics share the same problems as perturbation-based methods. Localization metrics and complexity metrics are unreliable for classes with a large spatial extent. In contrast, robustness metrics and randomization metrics consistently exhibit greater stability. Our experimental results support these methodological findings. Based on our analysis, we provide guidelines for selecting explanation methods, metrics, and hyperparameters in the context of RS image scene classification.

翻译：遥感影像场景分类任务中可解释人工智能方法的发展已引起广泛关注。当前遥感领域使用的大多数xAI方法及相关评价指标最初是为计算机视觉中的自然图像设计的，直接应用于遥感影像可能并不适用。针对这一问题，本文系统研究了遥感影像场景分类背景下解释方法与评价指标的有效性。具体而言，我们从方法论和实验两个维度，对涵盖五大类别（忠实性、鲁棒性、定位性、复杂度、随机化）的十种解释指标进行了分析，这些指标应用于五种经典特征归因方法（Occlusion、LIME、GradCAM、LRP和DeepLIFT），并在三个遥感数据集上进行了验证。方法论分析揭示了解释方法与评价指标存在的主要局限：基于扰动的方法（如Occlusion和LIME）的性能高度依赖于扰动基线与遥感场景的空间特征；基于梯度的方法（如GradCAM）在图像中存在多类别标签时表现不佳；部分相关性传播方法（如LRP）可能产生与类别空间分布不成比例的相关性分配。相应地，评价指标也存在局限性：忠实性指标与基于扰动的方法存在相同缺陷；定位性指标与复杂度指标对空间分布广泛的类别可靠性不足；相比之下，鲁棒性指标与随机化指标表现出更稳定的特性。实验结果支持了上述方法论发现。基于分析结论，本文为遥感影像场景分类任务中解释方法选择、评价指标确定及超参数设置提供了实践指导。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日