The developments that language models have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful language models available. However, users may prefer open-source language models due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected language models based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.
翻译:语言模型在完成几乎各类任务方面取得的进展不仅吸引了研究人员的关注,还引起了社会的广泛兴趣,并使其得以商业化。当前存在一些商业上成功的语言模型。然而,用户可能出于成本、数据隐私或法规等考虑而倾向于使用开源语言模型。尽管这类模型数量日益增多,但目前尚无针对其在土耳其语性能方面的全面比较。本研究旨在填补这一文献空白。我们基于上下文学习能力和问答能力,对七个选定语言模型进行了比较。研究准备了土耳其语的上下文学习和问答数据集,并进行了自动评估和人工评估。结果表明,在问答任务上,使用指令数据集进行微调前继续预训练,能更有效地使多语言模型适应土耳其语;而上下文学习性能与问答性能之间并无显著关联。