As artificial intelligence increasingly drives critical decisions, the ability to genuinely explain how neural networks make predictions is essential for trust. Yet, most current explanation methods offer post-hoc rationalizations rather than guaranteeing a true reflection of the model's reasoning. We introduce the notion of explanatory alignment, a requirement that explanations directly construct predictions rather than rationalize them. To achieve this in complex data domains, we present Pointwise-interpretable Networks (PiNets), a pseudo-linear architecture that forms linear models instance-wise. Evaluated on image classification and segmentation tasks, PiNets demonstrate that their explanations are deeply faithful across four criteria: meaningfulness, alignment, robustness, and sufficiency (MARS). Our contributions pave the way for promising avenues: by reconciling the predictive power of deep learning with the interpretability of linear models, PiNets provide a principled foundation for trustworthy AI and data-driven scientific discovery.
翻译:随着人工智能日益驱动关键决策,真正解释神经网络如何做出预测的能力对于建立信任至关重要。然而,当前大多数解释方法提供的是事后合理化解释,而非保证对模型推理的真实反映。我们提出了解释对齐的概念,即要求解释直接构建预测而非为其进行合理化。为了在复杂数据领域实现这一目标,我们提出了点可解释网络(PiNets),这是一种逐实例形成线性模型的伪线性架构。在图像分类和分割任务上的评估表明,PiNets的解释在四个标准上具有深度忠实性:有意义性、对齐性、鲁棒性和充分性(MARS)。我们的贡献为有前景的研究方向铺平了道路:通过调和深度学习的预测能力与线性模型的可解释性,PiNets为可信人工智能和数据驱动的科学发现提供了原则性基础。