The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hallucination types. In this study, we introduce the first multilingual, fine-grained news headline hallucination detection dataset that contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts. We conduct extensive experiments on this dataset under two settings. First, we implement several supervised fine-tuning approaches as preparatory solutions and demonstrate this dataset's challenges and utilities. Second, we test various large language models' in-context learning abilities and propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance in terms of the example-F1 metric. We release this dataset to foster further research in multilingual, fine-grained headline hallucination detection.
翻译:随着预训练语言模型的发展,自动新闻标题生成技术日益普及。然而,这些模型常存在"幻觉"问题,即生成的标题无法完全被源文章内容所支持。目前针对该问题的研究主要集中于英语领域,且多采用过于简化的分类方案,忽略了细粒度的幻觉类型。本研究首次构建了一个多语言细粒度新闻标题幻觉检测数据集,包含5种语言共1.1万余对样本,每对均由专家标注了详细的幻觉类型。我们在该数据集上开展了两种场景下的广泛实验:首先,我们实施了若干监督微调方法作为基础解决方案,验证了该数据集的挑战性与实用性;其次,我们测试了多种大语言模型的上下文学习能力,并提出两种创新技术——语言相关示例选择与由粗到细提示策略,在示例-F1指标上显著提升了少样本幻觉检测性能。我们公开此数据集以促进多语言细粒度标题幻觉检测领域的深入研究。