Local Agnostic Video Explanations: a Study on the Applicability of Removal-Based Explanations to Video

Explainable artificial intelligence techniques are becoming increasingly important with the rise of deep learning applications in various domains. These techniques aim to provide a better understanding of complex "black box" models and enhance user trust while maintaining high learning performance. While many studies have focused on explaining deep learning models in computer vision for image input, video explanations remain relatively unexplored due to the temporal dimension's complexity. In this paper, we present a unified framework for local agnostic explanations in the video domain. Our contributions include: (1) Extending a fine-grained explanation framework tailored for computer vision data, (2) Adapting six existing explanation techniques to work on video data by incorporating temporal information and enabling local explanations, and (3) Conducting an evaluation and comparison of the adapted explanation methods using different models and datasets. We discuss the possibilities and choices involved in the removal-based explanation process for visual data. The adaptation of six explanation methods for video is explained, with comparisons to existing approaches. We evaluate the performance of the methods using automated metrics and user-based evaluation, showing that 3D RISE, 3D LIME, and 3D Kernel SHAP outperform other methods. By decomposing the explanation process into manageable steps, we facilitate the study of each choice's impact and allow for further refinement of explanation methods to suit specific datasets and models.

翻译：可解释人工智能技术随着深度学习应用在多个领域的兴起而日益重要。这些技术旨在提供对复杂"黑箱"模型更深入的理解，并在保持高学习性能的同时增强用户信任。尽管许多研究聚焦于计算机视觉中图像输入的深度学习模型解释，但由于时间维度的复杂性，视频解释仍相对未被充分探索。本文提出一个面向视频领域的统一局部不可知解释框架。我们的贡献包括：(1) 扩展一个专为计算机视觉数据设计的细粒度解释框架，(2) 通过融入时间信息并支持局部解释，将六种现有解释技术适配到视频数据，以及(3) 使用不同模型和数据集对适配后的解释方法进行评估与比较。我们探讨了基于移除的可视数据解释过程中涉及的可行性与选择，详细说明了六种视频解释方法的适配过程，并与现有方法进行了对比。通过自动化指标与用户评估对方法性能进行评测，结果表明3D RISE、3D LIME和3D Kernel SHAP优于其他方法。通过将解释过程分解为可管理的步骤，我们有助于研究每个选择的影响，并允许进一步优化解释方法以适配特定数据集和模型。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日