The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.
翻译:社交媒体平台上错误信息的快速传播引发了其对公众舆论影响的担忧。尽管错误信息在其他语言中也很普遍,但该领域的大多数研究都集中在英语上。因此,包括土耳其语在内的其他语言的数据集十分稀缺。为解决这一问题,我们引入了FCTR数据集,包含3238个真实世界的声明。该数据集涵盖多个领域,并整合了来自三家土耳其事实核查机构的证据。此外,我们旨在评估跨语言迁移学习对低资源语言的有效性,尤其关注土耳其语。我们展示了大型语言模型在此背景下的上下文学习(零样本和少样本)性能。实验结果表明,该数据集有潜力推动土耳其语研究的发展。