We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understanding parliamentary language according to the MLM task. We provide a detailed evaluation of the model's performance, showing improvements in perplexity and accuracy over the baseline DictaBERT model.
翻译:我们提出了Knesset-DictaBERT,这是一个基于以色列议会语料库(Knesset Corpus)微调的大型希伯来语语言模型。该模型基于DictaBERT架构构建,在MLM任务上展现出对议会语言理解能力的显著提升。我们提供了模型性能的详细评估,结果表明其在困惑度和准确率上均优于基线DictaBERT模型。