Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount. In this work, we propose DELL that identifies three key stages in misinformation detection where LLMs could be incorporated as part of the pipeline: 1) LLMs could \emph{generate news reactions} to represent diverse perspectives and simulate user-news interaction networks; 2) LLMs could \emph{generate explanations} for proxy tasks (e.g., sentiment, stance) to enrich the contexts of news articles and produce experts specializing in various aspects of news understanding; 3) LLMs could \emph{merge task-specific experts} and provide an overall prediction by incorporating the predictions and confidence scores of varying experts. Extensive experiments on seven datasets with three LLMs demonstrate that DELL outperforms state-of-the-art baselines by up to 16.8\% in macro f1-score. Further analysis reveals that the generated reactions and explanations are greatly helpful in misinformation detection, while our proposed LLM-guided expert merging helps produce better-calibrated predictions.
翻译:大语言模型在事实性和幻觉方面存在局限,难以直接用于判断新闻文章的真实性,而事实准确性在此类任务中至关重要。在本研究中,我们提出DELL框架,该框架识别出虚假信息检测中可融入大语言模型的三个关键阶段:1)大语言模型可\emph{生成新闻反应}以呈现多元视角,并模拟用户-新闻交互网络;2)大语言模型可为代理任务(如情感分析、立场检测)\emph{生成解释},以丰富新闻文章的上下文信息,并培育专注于新闻理解不同维度的专家模型;3)大语言模型可\emph{整合任务特定专家},通过融合不同专家的预测结果及置信度分数来提供综合判断。在七个数据集上使用三种大语言模型进行的广泛实验表明,DELL在宏观F1分数上最高超越现有最优基线方法16.8%。进一步分析表明,生成的反应与解释对虚假信息检测具有显著助益,而我们提出的大语言模型引导的专家整合机制有助于产生校准更优的预测结果。