We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
翻译:我们介绍了循环宝石(RecurrentGemma),这是一个采用谷歌新型Griffin架构的开放语言模型家族。Griffin通过将线性递归与局部注意力机制相结合,在语言任务上实现了卓越性能。该架构具有固定大小的状态,从而降低了内存使用量,并支持对长序列进行高效推理。我们提供了包含20亿和90亿参数的两种规模模型,并为两者均提供了预训练版本和指令微调版本。尽管在更少的训练标记上进行训练,我们的模型性能仍可与同等规模的Gemma基线模型相媲美。