Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: 1) training from scratch and 2) converting decoders via LLM2Vec. We introduce two resources: ModernGBERT (134M, 1B), fully transparent German encoders in the ModernBERT style, and LL\"aMmleinVec (120M, 1B, 7B), decoder-to-encoder conversions trained with masked next-token prediction, both undergoing a context extension to 8.192 tokens. Across SuperGLEBer, ModernGBERT 1B sets a new state of the art (avg 0.808), surpassing GBERT Large (+4%) and the seven-times larger converted 7B model (0.787). On German MTEB after supervised fine-tuning, ModernGBERT 1B (0.551) approaches the converted 7B model (0.557). We release all models, checkpoints, datasets, and full training records, and introduce an encoder-adapted QA-NIAH evaluation. All in all, our results provide actionable guidance: when parameter efficiency and latency matter, from-scratch encoders dominate. When a pre-trained decoder exists and compute is a limited, conversion offers an effective alternative. ModernGBERT and LL\"aMmleinVec, including all code, data and intermediary checkpoints are published under a research-only RAIL license.
翻译:尽管仅解码器架构的大语言模型(LLM)日益兴起,编码器在高效的德语自然语言处理(NLP)与自然语言理解(NLU)场景中仍然至关重要。本研究在相同数据和训练约束下,探索了获取高质量德语编码器的两种路径:1)从头训练;2)通过LLM2Vec方法将解码器转换为编码器。我们引入了两项资源:ModernGBERT(134M, 1B参数)——采用ModernBERT风格、完全透明的德语编码器,以及LL\"aMmleinVec(120M, 1B, 7B参数)——通过掩码下一词预测训练的解码器到编码器转换模型,两者均将上下文长度扩展至8,192个词元。在SuperGLEU基准测试中,ModernGBERT 1B模型取得了新的最优性能(平均得分0.808),超越了GBERT Large模型(提升约4%)以及参数量七倍于其的转换型7B模型(得分0.787)。在监督微调后的德语MTEB基准上,ModernGBERT 1B模型(得分0.551)接近转换型7B模型的表现(得分0.557)。我们公开了所有模型、检查点、数据集及完整训练记录,并引入了适用于编码器的QA-NIAH评估方法。总体而言,我们的研究结果提供了具有实践指导意义的结论:当参数效率和推理延迟是关键考量时,从头训练的编码器占据优势;若已有预训练解码器且计算资源有限,模型转换则提供了一种有效的替代方案。ModernGBERT与LL\"aMmleinVec模型,包括全部代码、数据及中间检查点,均已基于仅供研究使用的RAIL许可证发布。