This paper describes IAI group's participation for automated check-worthiness estimation for claims, within the framework of the 2024 CheckThat! Lab "Task 1: Check-Worthiness Estimation". The task involves the automated detection of check-worthy claims in English, Dutch, and Arabic political debates and Twitter data. We utilized various pre-trained generative decoder and encoder transformer models, employing methods such as few-shot chain-of-thought reasoning, fine-tuning, data augmentation, and transfer learning from one language to another. Despite variable success in terms of performance, our models achieved notable placements on the organizer's leaderboard: ninth-best in English, third-best in Dutch, and the top placement in Arabic, utilizing multilingual datasets for enhancing the generalizability of check-worthiness detection. Despite a significant drop in performance on the unlabeled test dataset compared to the development test dataset, our findings contribute to the ongoing efforts in claim detection research, highlighting the challenges and potential of language-specific adaptations in claim verification systems.
翻译:本文介绍了IAI团队在2024年CheckThat!实验室"任务1:可核查性评估"框架下,针对主张自动可核查性评估的参与工作。该任务涉及对英语、荷兰语和阿拉伯语政治辩论及Twitter数据中可核查主张的自动检测。我们采用了多种预训练的生成式解码器与编码器Transformer模型,运用了少样本思维链推理、微调、数据增强以及跨语言迁移学习等方法。尽管在性能表现上存在波动,我们的模型在组织方排行榜上取得了显著位次:在英语任务中位列第九,在荷兰语任务中位列第三,在阿拉伯语任务中位列第一,其中利用了多语言数据集以增强可核查性检测的泛化能力。尽管在未标注测试数据集上的性能相比开发测试数据集出现显著下降,我们的研究结果仍为持续的主张检测研究提供了贡献,凸显了主张验证系统中语言特定适配所面临的挑战与潜力。