Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to token-level output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if a LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several existing baselines and show that in sentence hallucination detection, our approach has AUC-PR scores comparable to grey-box methods, while SelfCheckGPT is best at passage factuality assessment.
翻译:生成式大型语言模型(如GPT-3)能够针对各类用户提示生成高度流畅的回复。然而,已知此类模型会编造事实并做出不实陈述,这削弱了其输出的可信度。现有的事实核查方法要么需要访问模型输出的词元级概率分布(对于ChatGPT等系统可能无法获取),要么依赖通过独立且常为复杂的模块进行接口的外部数据库。本文提出"SelfCheckGPT",一种基于采样的简洁方法,可在零资源条件下(即无需外部数据库)对黑箱模型进行事实核查。该方法基于一个简单思想:若大语言模型掌握某一概念,其采样生成的回复应具有相似性并包含一致的事实;而对于编造的事实,随机采样的回复则可能相互偏离并产生矛盾。我们通过使用GPT-3生成关于WikiBio数据集中人物的段落,并人工标注生成段落的事实性,对该方法进行验证。结果表明,SelfCheckGPT能够:i)检测不实与事实性句子;ii)按事实性对段落排序。相较于现有多个基线方法,在句子级幻觉检测任务中,我们的方法与灰箱方法获得相近的AUC-PR分数,而SelfCheckGPT在段落事实性评估中表现最优。