Explainable Artificial Intelligence (XAI) is targeted at understanding how models perform feature selection and derive their classification decisions. This paper explores post-hoc explanations for deep neural networks in the audio domain. Notably, we present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits which we use for classification tasks on spoken digits and speakers' biological sex. We use the popular XAI technique Layer-wise Relevance Propagation (LRP) to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks' feature selection are derived and subsequently tested through systematic manipulations of the input data. Further, we take a step beyond visual explanations and introduce audible heatmaps. We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.
翻译:可解释人工智能(XAI)旨在理解模型如何进行特征选择并推导分类决策。本文探讨了音频领域深度神经网络的决策后解释机制。我们提出一个包含30,000个英文口语数字音频样本的开源数据集,并将其用于口语数字分类及说话人生物性别分类任务。采用流行的XAI技术——逐层相关性传播(LRP),为处理波形或频谱图表示的两种神经网络架构识别相关特征。基于LRP获得的相关性评分,推导关于神经网络特征选择的假设,并通过系统性的输入数据操控进行验证。此外,我们超越视觉解释范畴,引入可听热力图。通过人类用户研究证明,可听解释相比视觉解释具有更优的可解释性。