In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.
翻译:本文提出对流行的稀疏Transformer架构Longformer编码器-解码器的扩展方案。稀疏Transformer面临的常见挑战在于难以编码长距离上下文信息,例如文档首尾讨论主题之间的关联性。我们提出一种选择性增强全局注意力的方法,并在多个基准数据集的抽象摘要任务中验证其有效性。通过在文本前添加额外关键词并对这些关键词编码全局注意力,实验表明该方法在部分基准数据集上实现了零样本、少样本和微调场景下的性能提升。