The Receptance Weighted Key Value (RWKV) model offers a novel alternative to the Transformer architecture, merging the benefits of recurrent and attention-based systems. Unlike conventional Transformers, which depend heavily on self-attention, RWKV adeptly captures long-range dependencies with minimal computational demands. By utilizing a recurrent framework, RWKV addresses some computational inefficiencies found in Transformers, particularly in tasks with long sequences. RWKV has recently drawn considerable attention for its robust performance across multiple domains. Despite its growing popularity, no systematic review of the RWKV model exists. This paper seeks to fill this gap as the first comprehensive review of the RWKV architecture, its core principles, and its varied applications, such as natural language generation, natural language understanding, and computer vision. We assess how RWKV compares to traditional Transformer models, highlighting its capability to manage long sequences efficiently and lower computational costs. Furthermore, we explore the challenges RWKV encounters and propose potential directions for future research and advancement. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/RWKV-Survey.
翻译:Receptance Weighted Key Value(RWKV)模型为Transformer架构提供了一种新颖的替代方案,它融合了循环系统与基于注意力系统的优势。与传统Transformer严重依赖自注意力机制不同,RWKV能够以极小的计算需求有效捕捉长距离依赖关系。通过采用循环框架,RWKV解决了Transformer中存在的一些计算效率问题,尤其是在处理长序列任务时。RWKV因其在多个领域展现出的强大性能,近期受到了广泛关注。尽管其知名度日益提升,但目前尚缺乏对RWKV模型的系统性综述。本文旨在填补这一空白,首次对RWKV架构、其核心原理及其多样化应用(如自然语言生成、自然语言理解和计算机视觉)进行全面综述。我们评估了RWKV与传统Transformer模型的对比表现,重点阐述了其高效处理长序列和降低计算成本的能力。此外,我们探讨了RWKV面临的挑战,并提出了未来研究与发展的潜在方向。我们持续维护相关的开源材料于:https://github.com/MLGroupJLU/RWKV-Survey。