Large Language Models have emerged as prime candidates to tackle misinformation mitigation. However, existing approaches struggle with hallucinations and overconfident predictions. We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods to provide better calibration for NLP misinformation mitigation solutions. We first investigate the calibration of sample-based consistency methods that exploit distinct features of consistency across sample sizes and stochastic levels. Next, we evaluate the performance and distributional shift of a robust numeric verbalization prompt across single vs. two-step confidence elicitation procedure. We also compare the performance of the same prompt with different versions of GPT and different numerical scales. Finally, we combine the sample-based consistency and verbalized methods to propose a hybrid framework that yields a better uncertainty estimation for GPT models. Overall, our work proposes novel uncertainty quantification methods that will improve the reliability of Large Language Models in misinformation mitigation applications.
翻译:大语言模型已成为应对虚假信息治理的主要候选方案。然而,现有方法存在幻觉现象和过度自信预测的问题。我们提出了一种不确定性量化框架,通过融合直接置信度获取与基于样本的一致性方法,为自然语言处理领域虚假信息治理方案提供更优的校准。首先,我们研究了利用样本规模和随机性层级间一致性特征的基于样本一致性方法的校准特性。其次,评估了稳健数值化提示在单步与两步置信度获取流程中的性能表现及分布偏移。此外,我们对比了同一提示在不同GPT版本和数值量级下的性能差异。最后,通过融合基于样本一致性与语言化方法,构建了混合框架,为GPT模型提供了更优的不确定性估计。总体而言,本工作提出的新型不确定性量化方法将提升大语言模型在虚假信息治理应用中的可靠性。