In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
翻译:本文提出了一种策略,用于识别大规模语言模型在分类任务中的文本显著性。在视觉网络中,显著性研究较为成熟,其定位自然通过网络的卷积层实现;然而,这一特点并不适用于用于自然语言处理的现代Transformer堆叠网络。我们针对此类网络调整了基于梯度的显著性方法,提出了一种评估各层语义连贯性程度的方法,并在多个基准分类数据集上展示了该方法相较于其他文本显著性方法的持续改进。我们的方法无需额外训练或访问标注数据,且计算效率相对较高。