Calibrated Language Models Must Hallucinate

Recent language models generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.

翻译：近期语言模型生成虚假但听起来可信的文本，其发生频率惊人。这类"幻觉"现象阻碍了基于语言的人工智能系统的可用性，并可能对依赖其输出的人群造成伤害。本研究证明，预训练语言模型对特定类型事实的幻觉发生率存在固有统计下限，这与Transformer语言模型架构或数据质量无关。针对训练数据无法验证其真实性的"任意"事实，我们证明：满足生成式语言模型统计校准条件的语言模型，必然以特定概率产生幻觉。具体而言，当任意事实的最大概率存在上界时，即便在理想无误差训练数据条件下，生成幻觉的概率仍接近训练数据中仅出现一次的事实占比（即Good-Turing估计值）。研究结论之一是：经过充分预训练成为良好预测器（即校准模型）可能需要后训练处理，以减轻关于训练集中仅出现一次的任意事实的幻觉。然而，我们的分析同时表明：对于在训练数据中多次出现的事实（如书籍、文章等参考文献的引用——这类幻觉尤为显著且问题突出）或系统性事实（如算术计算），预训练过程并无统计学必然导致幻觉。因此，采用不同的架构与学习算法或可缓解后两类幻觉。