We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.
翻译:我们提出瑞士BERT(SwissBERT),一种专为处理瑞士相关文本设计的掩码语言模型。该预训练模型针对瑞士四种国家语言(德语、法语、意大利语和罗曼什语)的新闻文章进行了适配。通过在瑞士相关的自然语言理解任务上进行评估,我们发现该模型在这些任务上通常优于现有模型,尤其是在处理当代新闻和/或罗曼什语格里斯丘恩语时表现更佳。由于瑞士BERT采用了语言适配器架构,未来可将其扩展至瑞士德语方言。该模型及其开源代码已公开发布于https://github.com/ZurichNLP/swissbert。