NeoDictaBERT：推动希伯来语 BERT 模型的前沿 (NeoDictaBERT: Pushing the Frontier of BERT models for Hebrew)

Since their initial release, BERT models have demonstrated exceptional performance on a variety of tasks, despite their relatively small size (BERT-base has ~100M parameters). Nevertheless, the architectural choices used in these models are outdated compared to newer transformer-based models such as Llama3 and Qwen3. In recent months, several architectures have been proposed to close this gap. ModernBERT and NeoBERT both show strong improvements on English benchmarks and significantly extend the supported context window. Following their successes, we introduce NeoDictaBERT and NeoDictaBERT-bilingual: BERT-style models trained using the same architecture as NeoBERT, with a dedicated focus on Hebrew texts. These models outperform existing ones on almost all Hebrew benchmarks and provide a strong foundation for downstream tasks. Notably, the NeoDictaBERT-bilingual model shows strong results on retrieval tasks, outperforming other multilingual models of similar size. In this paper, we describe the training process and report results across various benchmarks. We release the models to the community as part of our goal to advance research and development in Hebrew NLP.

翻译：自首次发布以来，BERT 模型已在多种任务上展现出卓越性能，尽管其规模相对较小（BERT-base 约含 1 亿参数）。然而，与 Llama3 和 Qwen3 等基于 Transformer 的新模型相比，这些模型采用的架构选择已显过时。近几个月来，已有多种架构被提出以弥合这一差距。ModernBERT 和 NeoBERT 均在英语基准测试中显示出显著改进，并大幅扩展了支持的上下文窗口。基于这些成功，我们推出了 NeoDictaBERT 和 NeoDictaBERT-bilingual：采用与 NeoBERT 相同架构训练的 BERT 风格模型，并专注于希伯来语文本处理。这些模型在几乎所有希伯来语基准测试中均优于现有模型，为下游任务提供了坚实基础。值得注意的是，NeoDictaBERT-bilingual 模型在检索任务中表现优异，超越了其他同等规模的多语言模型。本文描述了训练过程，并报告了在各类基准测试中的结果。作为推动希伯来语自然语言处理研究与开发目标的一部分，我们将模型公开发布给社区。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日