Since their initial release, BERT models have demonstrated exceptional performance on a variety of tasks, despite their relatively small size (BERT-base has ~100M parameters). Nevertheless, the architectural choices used in these models are outdated compared to newer transformer-based models such as Llama3 and Qwen3. In recent months, several architectures have been proposed to close this gap. ModernBERT and NeoBERT both show strong improvements on English benchmarks and significantly extend the supported context window. Following their successes, we introduce NeoDictaBERT and NeoDictaBERT-bilingual: BERT-style models trained using the same architecture as NeoBERT, with a dedicated focus on Hebrew texts. These models outperform existing ones on almost all Hebrew benchmarks and provide a strong foundation for downstream tasks. Notably, the NeoDictaBERT-bilingual model shows strong results on retrieval tasks, outperforming other multilingual models of similar size. In this paper, we describe the training process and report results across various benchmarks. We release the models to the community as part of our goal to advance research and development in Hebrew NLP.
翻译:自首次发布以来,BERT 模型已在多种任务上展现出卓越性能,尽管其规模相对较小(BERT-base 约含 1 亿参数)。然而,与 Llama3 和 Qwen3 等基于 Transformer 的新模型相比,这些模型采用的架构选择已显过时。近几个月来,已有多种架构被提出以弥合这一差距。ModernBERT 和 NeoBERT 均在英语基准测试中显示出显著改进,并大幅扩展了支持的上下文窗口。基于这些成功,我们推出了 NeoDictaBERT 和 NeoDictaBERT-bilingual:采用与 NeoBERT 相同架构训练的 BERT 风格模型,并专注于希伯来语文本处理。这些模型在几乎所有希伯来语基准测试中均优于现有模型,为下游任务提供了坚实基础。值得注意的是,NeoDictaBERT-bilingual 模型在检索任务中表现优异,超越了其他同等规模的多语言模型。本文描述了训练过程,并报告了在各类基准测试中的结果。作为推动希伯来语自然语言处理研究与开发目标的一部分,我们将模型公开发布给社区。