Transformer-based architectures have advanced text summarization, yet their quadratic complexity limits scalability on long documents. This paper introduces BiSparse-AAS (Bilinear Sparse Attention with Adaptive Spans), a novel framework that combines sparse attention, adaptive spans, and bilinear attention to address these limitations. Sparse attention reduces computational costs by focusing on the most relevant parts of the input, while adaptive spans dynamically adjust the attention ranges. Bilinear attention complements both by modeling complex token interactions within this refined context. BiSparse-AAS consistently outperforms state-of-the-art baselines in both extractive and abstractive summarization tasks, achieving average ROUGE improvements of about 68.1% on CNN/DailyMail and 52.6% on XSum, while maintaining strong performance on OpenWebText and Gigaword datasets. By addressing efficiency, scalability, and long-sequence modeling, BiSparse-AAS provides a unified, practical solution for real-world text summarization applications.
翻译:基于Transformer的架构推动了文本摘要的发展,但其二次复杂度限制了在长文档上的可扩展性。本文提出了BiSparse-AAS(具有自适应跨度的双线性稀疏注意力),这是一种新颖的框架,它结合了稀疏注意力、自适应跨度和双线性注意力以应对这些限制。稀疏注意力通过聚焦于输入中最相关的部分来降低计算成本,而自适应跨度则动态调整注意力范围。双线性注意力在此精炼的上下文中对复杂的词元交互进行建模,从而对两者形成补充。BiSparse-AAS在抽取式和生成式摘要任务中均持续优于现有最先进的基线模型,在CNN/DailyMail数据集上平均ROUGE分数提升约68.1%,在XSum数据集上提升约52.6%,同时在OpenWebText和Gigaword数据集上保持了强劲的性能。通过解决效率、可扩展性和长序列建模问题,BiSparse-AAS为现实世界的文本摘要应用提供了一个统一且实用的解决方案。