Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data. Inspired by the journalistic device of "according to sources", we propose according-to prompting: directing LLMs to ground responses against previously observed text. To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-produced answers are directly found in underlying text corpora. We illustrate with experiments on Wikipedia that these prompts improve grounding under our metrics, with the additional benefit of often improving end-task performance. Furthermore, prompts that ask the model to decrease grounding (or to ground to other corpora) decrease grounding, indicating the ability of language models to increase or decrease grounded generations on request.
翻译:大型语言模型(LLMs)可能产生幻觉并生成虚假信息,尽管它们基于事实数据进行预训练。受新闻领域“根据来源”这一报道手法的启发,我们提出了“根据来源提示”方法:引导LLMs将回答锚定于先前观察到的文本。为量化这种锚定程度,我们提出了一种新型评估指标(QUIP-Score),用于衡量模型生成的答案在底层文本语料库中被直接发现的程度。通过在维基百科上的实验,我们表明这些提示在指标下提升了锚定程度,同时通常还能提高最终任务性能。此外,要求模型降低锚定程度(或锚定到其他语料库)的提示会降低锚定性,这表明语言模型能够根据指令调整生成内容的锚定程度。