Calibrated Language Models Must Hallucinate

Recent language models generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.

翻译：近期语言模型生成虚假但听起来合理的文本的频率令人惊讶。此类"幻觉"阻碍了基于语言的人工智能系统的可用性，并可能对依赖其输出的用户造成伤害。本研究证明，预训练语言模型对特定事实产生幻觉的概率存在固有的统计下限，这与Transformer语言模型架构或数据质量无关。对于无法通过训练数据验证真实性的"任意"事实，我们证明：若语言模型满足适用于生成式语言模型的统计校准条件，则幻觉必须以特定频率出现。具体而言，若任意事实的最大概率存在上界，则生成幻觉的概率接近于训练数据中仅出现一次的事实比例（即"古德-图灵"估计值），即使假设训练数据无任何错误。研究结论之一：预训练为足够优秀预测器（即校准模型）的模型，可能需要通过后训练来缓解这类在训练集中通常只出现一次的任意事实的幻觉。然而，我们的分析也表明：对于训练数据中出现次数超过一次的事实（如论文、书籍等出版物引用——这类幻觉问题尤为显著且棘手）或系统性事实（如算术计算），从统计学角度并无证据表明预训练必然导致幻觉。因此，不同的架构与学习算法可能有助于缓解后两类幻觉问题。