Multilingual Fine-Grained News Headline Hallucination Detection

The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hallucination types. In this study, we introduce the first multilingual, fine-grained news headline hallucination detection dataset that contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts. We conduct extensive experiments on this dataset under two settings. First, we implement several supervised fine-tuning approaches as preparatory solutions and demonstrate this dataset's challenges and utilities. Second, we test various large language models' in-context learning abilities and propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance in terms of the example-F1 metric. We release this dataset to foster further research in multilingual, fine-grained headline hallucination detection.

翻译：随着预训练语言模型的发展，自动新闻标题生成技术日益普及。然而，这些模型常存在"幻觉"问题，即生成的标题无法完全被源文章内容所支持。目前针对该问题的研究主要集中于英语领域，且多采用过于简化的分类方案，忽略了细粒度的幻觉类型。本研究首次构建了一个多语言细粒度新闻标题幻觉检测数据集，包含5种语言共1.1万余对样本，每对均由专家标注了详细的幻觉类型。我们在该数据集上开展了两种场景下的广泛实验：首先，我们实施了若干监督微调方法作为基础解决方案，验证了该数据集的挑战性与实用性；其次，我们测试了多种大语言模型的上下文学习能力，并提出两种创新技术——语言相关示例选择与由粗到细提示策略，在示例-F1指标上显著提升了少样本幻觉检测性能。我们公开此数据集以促进多语言细粒度标题幻觉检测领域的深入研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日