We present NoticIA, a dataset consisting of 850 Spanish news articles featuring prominent clickbait headlines, each paired with high-quality, single-sentence generative summarizations written by humans. This task demands advanced text understanding and summarization abilities, challenging the models' capacity to infer and connect diverse pieces of information to meet the user's informational needs generated by the clickbait headline. We evaluate the Spanish text comprehension capabilities of a wide range of state-of-the-art large language models. Additionally, we use the dataset to train ClickbaitFighter, a task-specific model that achieves near-human performance in this task.
翻译:本文介绍NoticIA数据集,该数据集包含850篇具有典型点击诱饵标题的西班牙语新闻文章,每篇文章均配有由人工撰写的高质量单句生成式摘要。该任务要求模型具备高级文本理解与摘要生成能力,挑战模型通过推断和连接多样化信息片段来满足点击诱饵标题所引发的用户信息需求的能力。我们评估了多种最先进大型语言模型的西班牙语文本理解能力。此外,我们利用该数据集训练了ClickbaitFighter——一个在此任务中达到接近人类表现的任务专用模型。