Self-attention-based models have achieved remarkable progress in short-text mining. However, the quadratic computational complexities restrict their application in long text processing. Prior works have adopted the chunking strategy to divide long documents into chunks and stack a self-attention backbone with the recurrent structure to extract semantic representation. Such an approach disables parallelization of the attention mechanism, significantly increasing the training cost and raising hardware requirements. Revisiting the self-attention mechanism and the recurrent structure, this paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention. Combining the advantages from both sides, the well-designed RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks, respectively. Furthermore, RAN is computationally scalable as it supports parallelization on long document processing. Extensive experiments demonstrate the long-text encoding ability of the proposed RAN model on both classification and sequential tasks, showing its potential for a wide range of applications.
翻译:基于自注意力的模型在短文本挖掘中取得了显著进展。然而,其二次计算复杂度限制了其在长文本处理中的应用。以往研究采用分块策略将长文档划分为多个块,并通过循环结构堆叠自注意力骨干网络以提取语义表示。但这种方法阻碍了注意力机制的并行化,显著增加了训练成本并提高了硬件要求。本文重新审视自注意力机制与循环结构,提出一种新型长文档编码模型——循环注意力网络(RAN),以实现自注意力的循环运算。通过融合两者的优势,精心设计的RAN能够在词级和文档级表征中提取全局语义,天然适配序列任务与分类任务。此外,RAN支持长文档处理的并行化,具有计算可扩展性。大量实验表明,所提出的RAN模型在分类与序列任务上均展现出卓越的长文本编码能力,彰显其在广泛场景中的应用潜力。