Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the "black box" problem still remains. Addressing this, we propose a novel framework, WWW, that offers the 'what', 'where', and 'why' of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain 'what'. To address the 'where' and 'why', we proposed a novel combination of neuron activation maps (NAMs) with Shapley values, generating localized concept maps and heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging heatmap similarities to estimate 'how' reliable the prediction is. Experimental evaluations of WWW demonstrate superior performance in both quantitative and qualitative metrics, outperforming existing methods in interpretability. WWW provides a unified solution for explaining 'what', 'where', and 'why', introducing a method for localized explanations from global interpretations and offering a plug-and-play solution adaptable to various architectures.
翻译:近年来神经网络的进步展示了其在各个领域的卓越能力。尽管取得了这些成功,“黑箱”问题依然存在。针对这一问题,我们提出了一个新颖的框架WWW,能够以人类可理解的方式提供神经网络决策的“什么”、“哪里”和“为什么”。具体而言,WWW利用自适应选择进行概念发现,采用自适应余弦相似度和阈值技术有效解释“什么”。为解决“哪里”和“为什么”,我们提出了神经元激活图(NAMs)与Shapley值的新型组合,为单个输入生成局部概念图和热力图。此外,WWW引入了一种预测不确定性的方法,利用热力图相似性来估计预测的“可靠性”。实验评估表明,WWW在定量和定性指标上均表现出优越性能,在可解释性方面优于现有方法。WWW提供了解释“什么”、“哪里”和“为什么”的统一解决方案,引入了一种从全局解释生成局部解释的方法,并提供了一种可适应多种架构的即插即用方案。