Misinformation poses a variety of risks, such as undermining public trust and distorting factual discourse. Large Language Models (LLMs) like GPT-4 have been shown effective in mitigating misinformation, particularly in handling statements where enough context is provided. However, they struggle to assess ambiguous or context-deficient statements accurately. This work introduces a new method to resolve uncertainty in such statements. We propose a framework to categorize missing information and publish category labels for the LIAR-New dataset, which is adaptable to cross-domain content with missing information. We then leverage this framework to generate effective user queries for missing context. Compared to baselines, our method improves the rate at which generated questions are answerable by the user by 38 percentage points and classification performance by over 10 percentage points macro F1. Thus, this approach may provide a valuable component for future misinformation mitigation pipelines.
翻译:虚假信息会带来多种风险,例如削弱公众信任和扭曲事实讨论。像GPT-4这样的大型语言模型(LLMs)已被证明在缓解虚假信息方面有效,尤其在处理提供足够上下文的陈述时表现突出。然而,它们在准确评估模糊或缺乏上下文的陈述方面存在困难。本文提出了一种新方法,用于解析这类陈述中的不确定性。我们提出的框架能够对缺失信息进行分类,并为LIAR-New数据集发布类别标签,该框架可适应跨领域内容中缺失信息的情形。接着,我们利用该框架生成针对缺失上下文的有效用户查询。与基线方法相比,我们的方法将生成问题可由用户回答的比例提高了38个百分点,并将分类性能(宏F1值)提升了超过10个百分点。因此,该方法可能为未来的虚假信息缓解流程提供有价值的组成部分。