Recent language models have a mysterious tendency to generate false but plausible-sounding text. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical reason that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucination is necessary for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.
翻译:近期语言模型存在一种神秘倾向,即生成虚假但听起来合理的文本。这类"幻觉"现象阻碍了基于语言的人工智能系统的可用性,并可能对依赖其输出的用户造成伤害。本研究表明,预训练语言模型对特定类型事实产生幻觉存在固有的统计原因,这与Transformer语言模型架构或数据质量无关。对于无法从训练数据中验证真伪的"任意"事实,我们证明:满足生成式语言模型统计校准条件的语言模型必然会产生幻觉。具体而言,若任何事实的最大概率存在上界,则生成幻觉的概率趋近于训练数据中仅出现一次的事实比例(即"Good-Turing"估计值),即便假设训练数据完美无差错。结论之一是:经过预训练且足够好的预测模型(即校准模型)可能需要后训练来缓解针对训练集中通常仅出现一次的任意事实产生的幻觉。然而,我们的分析也表明:对于在训练数据中多次出现的事实(如文章、书籍等出版物引用——其引发的幻觉尤其显著且棘手)或系统性事实(如算术计算),预训练导致幻觉不存在统计必然性。因此,不同架构和学习算法可能能够缓解后一类幻觉。