Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.
翻译:生成式大型语言模型(如GPT-3)能够针对各种用户提示生成高度流畅的回复。然而,这类模型已知会编造事实并做出非事实性陈述,这可能会削弱其输出的可信度。现有的事实核查方法要么需要访问输出概率分布(对于ChatGPT等系统可能不可用),要么需要通过独立且通常复杂的模块接口化外部数据库。在本工作中,我们提出“SelfCheckGPT”,一种简单的基于采样的方法,可用于以零资源方式(即无需外部数据库)核查黑盒模型的回复。SelfCheckGPT利用一个简单思想:若大型语言模型对某一概念有认知,则其采样的回复可能相似且包含一致的事实。但对于编造的事实,随机采样的回复往往会产生分歧并相互矛盾。我们通过使用GPT-3生成关于WikiBio数据集中个人描述的段落,并手动注释这些生成段落的事实性来研究该方法。我们证明SelfCheckGPT能够:i)检测非事实性和事实性句子;ii)按事实性对段落进行排序。我们将方法与多种基线进行比较,结果表明,在句子级幻觉检测中,我们的方法相比灰盒方法具有更高的AUC-PR分数,在段落级事实性评估中具有更高的相关性分数。