Given the complexity and lack of transparency in deep neural networks (DNNs), extensive efforts have been made to make these systems more interpretable or explain their behaviors in accessible terms. Unlike most reviews, which focus on algorithmic and model-centric perspectives, this work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI). We categorize existing work into three categories subject to their purposes: interpretations of deep models, referring to feature attributions and reasoning processes that correlate data points with model outputs; influences of training data, examining the impact of training data nuances, such as data valuation and sample anomalies, on decision-making processes; and insights of domain knowledge, discovering latent patterns and fostering new knowledge from data and models to advance social values and scientific discovery. Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities, such as images, text, and tabular data, as well as on training logs, checkpoints, models and other DNN behavior descriptors. In this way, our study offers a comprehensive, data-centric examination of XAI from a lens of data mining methods and applications.
翻译:鉴于深度神经网络(DNN)的复杂性与透明度缺乏,大量研究致力于提升这些系统的可解释性,或以易懂方式描述其行为。与多数聚焦算法和模型视角的综述不同,本文采用"数据为中心"的视角,探讨数据收集、处理与分析如何助力可解释人工智能(XAI)。我们根据研究目标将现有工作分为三类:深度模型解释——涉及将数据点与模型输出相关联的特征归因与推理过程;训练数据影响——考察训练数据细节(如数据估值与样本异常)对决策过程的影响;以及领域知识洞察——从数据与模型中挖掘潜在模式并催生新知识,以推动社会价值与科学发现。具体而言,我们将XAI方法论提炼为对跨模态数据(如图像、文本与表格数据)的训练与测试数据,以及对训练日志、检查点、模型及其他DNN行为描述符进行的数据挖掘操作。通过此方式,本研究从数据挖掘方法与应用的视角,提供了对XAI的全面、数据为中心的分析。