Given that Transformers are ubiquitous in wide tasks, interpreting their internals is a pivotal issue. Still, their particular components, feed-forward (FF) blocks, have typically been less analyzed despite their substantial parameter amounts. We analyze the input contextualization effects of FF blocks by rendering them in the attention maps as a human-friendly visualization scheme. Our experiments with both masked- and causal-language models reveal that FF networks modify the input contextualization to emphasize specific types of linguistic compositions. In addition, FF and its surrounding components tend to cancel out each other's effects, suggesting potential redundancy in the processing of the Transformer layer.
翻译:鉴于Transformer在各类任务中广泛应用,解析其内部机制已成为关键议题。尽管前馈模块参数量庞大,但其特定组件的分析仍相对不足。本文通过将前馈模块对输入上下文的处理效果可视化呈现为注意力图,构建了符合人类认知的解释框架。实验表明,在掩码语言模型与因果语言模型两类架构中,前馈网络均会调整输入上下文表征以突出特定语言成分的语义组合。此外,前馈模块与其邻近组件存在效应相互抵消的倾向,这暗示Transformer层可能存在计算冗余。