Transformer-based models have achieved great breakthroughs in recent years. However, there are many significant questions that have not been answered in the field of explaining the reason why the models have powerful outputs. We do not know how to locate the models' important parameters storing the knowledge for predicting the next word, and whether these parameters are stored on the same layer/module or different ones. Moreover, we do not understand the mechanism to merge the knowledge into the final embedding for next word prediction. In this paper, we explore the residual stream of transformers to increase the interpretability. We find the mechanism behind residual connection is a direct addition function on before-softmax values, so the probabilities of tokens with larger before-softmax values will increase. Moreover, we prove that using log probability increase as contribution scores is reasonable, and based on this we can locate important parameters. Besides, we propose a method to analyze how previous layers affect upper layers by comparing the inner products. The experimental results and case study show that our research can increase the interpretability of transformer-based models. We will release our code on https://github.com/zepingyu0512/residualstream.
翻译:基于Transformer的模型近年来取得了重大突破。然而,在解释这些模型为何能产生强大输出方面,仍有许多重要问题尚未得到解答。我们不知道如何定位模型存储用于预测下一个单词的知识的重要参数,也不清楚这些参数是存储在同一层/模块还是不同层/模块中。此外,我们也不理解将知识融合到最终嵌入中以进行下一个单词预测的机制。本文通过探索Transformer的残差流来提升其可解释性。我们发现残差连接背后的机制是对softmax前数值的直接相加函数,因此具有较大softmax前数值的令牌(token)的概率会增加。同时,我们证明了将对数概率增量作为贡献分数是合理的,并基于此能够定位重要参数。此外,我们还提出了一种通过比较内积来分析先前层如何影响上层的方法。实验与案例研究表明,我们的研究能够增强基于Transformer模型的可解释性。相关代码将发布在https://github.com/zepingyu0512/residualstream。