We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer
翻译:我们提出了Eagle(RWKV-5)和Finch(RWKV-6)序列模型,它们是在RWKV(RWKV-4)架构基础上的改进。我们的架构设计进展包括多头矩阵值状态和动态循环机制,这些改进在保持RNN推理效率特性的同时增强了表达能力。我们引入了一个包含1.12万亿词符的多语言语料库,以及基于贪婪匹配的快速分词器以提升多语言处理能力。我们训练了四个参数规模从0.46亿到75亿不等的Eagle模型,以及两个参数分别为16亿和31亿的Finch模型,发现它们在各类基准测试中均展现出具有竞争力的性能。我们依据Apache 2.0许可证在HuggingFace上发布了所有模型。模型地址:https://huggingface.co/RWKV 训练代码:https://github.com/RWKV/RWKV-LM 推理代码:https://github.com/RWKV/ChatRWKV 时间并行训练代码:https://github.com/RWKV/RWKV-infctx-trainer