Despite the recent progress in deep neural networks (DNNs), it remains challenging to explain the predictions made by DNNs. Existing explanation methods for DNNs mainly focus on post-hoc explanations where another explanatory model is employed to provide explanations. The fact that post-hoc methods can fail to reveal the actual original reasoning process of DNNs raises the need to build DNNs with built-in interpretability. Motivated by this, many self-explaining neural networks have been proposed to generate not only accurate predictions but also clear and intuitive insights into why a particular decision was made. However, existing self-explaining networks are limited in providing distribution-free uncertainty quantification for the two simultaneously generated prediction outcomes (i.e., a sample's final prediction and its corresponding explanations for interpreting that prediction). Importantly, they also fail to establish a connection between the confidence values assigned to the generated explanations in the interpretation layer and those allocated to the final predictions in the ultimate prediction layer. To tackle the aforementioned challenges, in this paper, we design a novel uncertainty modeling framework for self-explaining networks, which not only demonstrates strong distribution-free uncertainty modeling performance for the generated explanations in the interpretation layer but also excels in producing efficient and effective prediction sets for the final predictions based on the informative high-level basis explanations. We perform the theoretical analysis for the proposed framework. Extensive experimental evaluation demonstrates the effectiveness of the proposed uncertainty framework.
翻译:尽管深度神经网络(DNNs)近期取得了显著进展,但解释其预测结果仍是一项挑战。现有的DNN解释方法主要聚焦于事后解释,即通过另一个解释模型来提供说明。由于事后方法可能无法揭示DNN实际的原始推理过程,因此需要构建具有内在可解释性的DNN模型。受此启发,众多自解释神经网络被提出,它们不仅能生成准确的预测,还能为特定决策的缘由提供清晰直观的洞见。然而,现有自解释网络在为其同时生成的两类预测结果(即样本的最终预测及用于解释该预测的对应解释)提供无分布假设的不确定性量化方面存在局限性。更重要的是,这些方法未能建立解释层中分配给生成解释的置信度与最终预测层中分配给预测结果的置信度之间的关联。为解决上述挑战,本文设计了一种新颖的自解释网络不确定性建模框架,该框架不仅能在解释层中对生成的解释展现出强大的无分布假设不确定性建模性能,还能基于信息丰富的高层基础解释,高效生成针对最终预测的有效预测集。我们对该框架进行了理论分析,大量实验评估证明了所提不确定性框架的有效性。