This paper explores the much discussed, possible explanatory link between attention weights (AW) in transformer models and predicted output. Contrary to intuition and early research on attention, more recent prior research has provided formal arguments and empirical evidence that AW are not explanatorily relevant. We show that the formal arguments are incorrect. We introduce and effectively compute efficient attention, which isolates the effective components of attention matrices in tasks and models in which AW play an explanatory role. We show that efficient attention has a causal role (provides minimally necessary and sufficient conditions) for predicting model output in NLP tasks requiring contextual information, and we show, contrary to [7], that efficient attention matrices are probability distributions and are effectively calculable. Thus, they should play an important part in the explanation of attention based model behavior. We offer empirical experiments in support of our method illustrating various properties of efficient attention with various metrics on four datasets.
翻译:本文探讨了Transformer模型中注意力权重(AW)与预测输出之间备受讨论的可能解释性联系。与直觉及早期注意力研究相反,近期已有研究通过形式化论证和实证证据表明AW不具备解释相关性。我们证明这些形式化论证是错误的。我们提出并有效计算了高效注意力,该方法在AW起解释性作用的任务和模型中分离出注意力矩阵的有效成分。我们证明,在需要上下文信息的自然语言处理任务中,高效注意力对模型输出预测具有因果作用(提供最小必要且充分条件);同时我们证明,与文献[7]结论相反,高效注意力矩阵是概率分布且可有效计算。因此,高效注意力应在基于注意力的模型行为解释中发挥重要作用。我们通过实证实验支持所提方法,在四个数据集上使用多种指标展示了高效注意力的不同特性。