Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication $F(x) = W \cdot x + b$. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.
翻译:神经网络在许多科学领域取得了显著成功。然而,神经网络模型的可解释性仍是将其技术应用于日常生活的关键瓶颈。这一挑战源于神经网络的非线性行为,引发了一个核心问题:模型如何利用输入特征进行决策?解决该问题的经典方法是特征归因,即为每个输入特征分配重要性分数,揭示其对当前预测的重要性。然而,当前的特征归因方法通常仅表明各输入特征的重要性,却未详细说明模型内部如何实际处理这些特征。这些归因方法常引发质疑:它们是否真正突出了模型预测所需的关键特征?对于神经网络模型,非线性行为通常由模型的非线性激活单元引起。但神经网络模型对单个预测的计算行为在局部呈线性,因为一次预测仅对应一种激活模式。基于此观察,我们提出一种逐实例线性化方法,重新表述神经网络预测的前向计算过程。该方法将卷积神经网络的不同层重新表述为线性矩阵乘法。通过聚合所有层的计算,复杂的卷积神经网络操作可被描述为线性矩阵乘法 $F(x) = W \cdot x + b$。该方程不仅能提供突出输入特征重要性的特征归因图,还能精确揭示每个输入特征对预测的具体贡献。此外,我们讨论了该技术在监督分类与无监督神经网络学习参数化t-SNE降维中的应用。