The transformer is a powerful data modelling framework responsible for remarkable performance on a wide range of tasks. However, they are limited in terms of scalability as it is suboptimal and inefficient to process long-sequence data. To this purpose we introduce BLRP (Bidirectional Long-Range Parser), a novel and versatile attention mechanism designed to increase performance and efficiency on long-sequence tasks. It leverages short and long range heuristics in the form of a local sliding window approach combined with a global bidirectional latent space synthesis technique. We show the benefits and versatility of our approach on vision and language domains by demonstrating competitive results against state-of-the-art methods on the Long-Range-Arena and CIFAR benchmarks together with ablations demonstrating the computational efficiency.
翻译:Transformer是一个强大的数据建模框架,在广泛的任务中取得了卓越的性能。然而,其在处理长序列数据时存在可扩展性不足的问题,表现为效率低下且并非最优方案。为此,我们提出BLRP(双向长程解析器)——一种新颖且通用的注意力机制,旨在提升长序列任务的性能与效率。该机制通过局部滑动窗口方法结合全局双向潜在空间合成技术,利用短程与长程启发式信息。我们在视觉与语言领域展示了该方法的效果与通用性,在长程竞技场和CIFAR基准测试中取得了与尖端方法相媲美的结果,并通过消融实验证明了其计算效率。