We introduce T5Gemma 2, the next generation of the T5Gemma family of lightweight open encoder-decoder models, featuring strong multilingual, multimodal and long-context capabilities. T5Gemma 2 follows the adaptation recipe (via UL2) in T5Gemma -- adapting a pretrained decoder-only model into an encoder-decoder model, and extends it from text-only regime to multimodal based on the Gemma 3 models. We further propose two methods to improve the efficiency: tied word embedding that shares all embeddings across encoder and decoder, and merged attention that unifies decoder self- and cross-attention into a single joint module. Experiments demonstrate the generality of the adaptation strategy over architectures and modalities as well as the unique strength of the encoder-decoder architecture on long context modeling. Similar to T5Gemma, T5Gemma 2 yields comparable or better pretraining performance and significantly improved post-training performance than its Gemma 3 counterpart. We release the pretrained models (270M-270M, 1B-1B and 4B-4B) to the community for future research.
翻译:我们介绍了T5Gemma 2,这是T5Gemma系列轻量级开源编码器-解码器模型的下一代产品,具备强大的多语言、多模态和长上下文处理能力。T5Gemma 2遵循了T5Gemma中的适配方法(通过UL2)——将预训练的仅解码器模型适配为编码器-解码器模型,并基于Gemma 3模型将其从纯文本领域扩展到多模态领域。我们进一步提出了两种提升效率的方法:在编码器和解码器之间共享所有嵌入的绑定词嵌入,以及将解码器自注意力和交叉注意力统一为单个联合模块的合并注意力机制。实验证明了该适配策略在不同架构和模态上的通用性,以及编码器-解码器架构在长上下文建模上的独特优势。与T5Gemma类似,T5Gemma 2在预训练性能上达到或优于其对应的Gemma 3模型,并在后训练性能上显著提升。我们向研究社区发布了预训练模型(270M-270M、1B-1B和4B-4B),以供未来研究使用。