Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.
翻译:长序列Transformer旨在提升语言模型对长文本的表示能力及其在下游文档级任务中的性能。然而,对于长文本模型中词元级别预测的质量,目前尚缺乏充分理解。本研究在无监督理由提取的文档分类背景下,系统探究了此类架构的性能表现。我们发现,标准软注意力机制在与Longformer语言模型结合时,性能显著下降。为此,我们提出一种基于RoBERTa逐句处理的组合软注意力架构,可在词元级别提取合理的解释性依据。实验表明,该方法在情感分类数据集上显著优于基于Longformer的基线模型,同时大幅降低了运行时开销。