Despite the rapid expansion of types of large language models, there remains a notable gap in models specifically designed for the Dutch language. This gap is not only a shortage in terms of pretrained Dutch models but also in terms of data, and benchmarks and leaderboards. This work provides a small step to improve the situation. First, we introduce two fine-tuned variants of the Llama 2 13B model. We first fine-tuned Llama 2 using Dutch-specific web-crawled data and subsequently refined this model further on multiple synthetic instruction and chat datasets. These datasets as well as the model weights are made available. In addition, we provide a leaderboard to keep track of the performance of (Dutch) models on a number of generation tasks, and we include results of a number of state-of-the-art models, including our own. Finally we provide a critical conclusion on what we believe is needed to push forward Dutch language models and the whole eco-system around the models.
翻译:尽管大型语言模型的类型迅速扩展,但专门针对荷兰语的模型仍存在显著缺口。这一缺口不仅体现在预训练荷兰语模型的缺乏,还表现在数据、基准测试和排行榜方面的不足。本文旨在为改善这一现状提供些许努力。首先,我们引入了Llama 2 13B模型的两个微调变体。我们首先使用荷兰语特定的网络爬取数据对Llama 2进行微调,随后在多个合成指令和对话数据集上进一步优化该模型。这些数据集以及模型权重均已公开。此外,我们提供了一个排行榜,用于追踪(荷兰语)模型在多项生成任务上的表现,并包含包括我们模型在内的多个最新模型的评估结果。最后,我们针对推动荷兰语语言模型及其整个生态系统发展所需的要素提出了批判性结论。