Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.
翻译:大型语言模型(LLMs)常表现出不良行为,如幻觉和序列重复。我们建议将这些行为视为模型在不确定性下表现出的回退行为,并研究它们之间的关联。我们将回退行为分类为序列重复、退化文本和幻觉,并在同一模型家族中进行了广泛分析,这些模型在预训练词元数量、参数量或是否包含指令跟随训练方面存在差异。实验结果表明,在所有维度上都存在清晰一致的回退行为排序:LLM越先进(即训练词元越多、参数量越大或经过指令微调),其回退行为就从序列重复转向退化文本,再转向幻觉。此外,即使在性能最佳的模型中,单个生成过程中也观察到相同的排序规律:随着不确定性增加,模型会从生成幻觉转向产生退化文本,再转向序列重复。最后,我们证明虽然常见的解码技术(如随机采样)可能缓解某些不良行为(如序列重复),但会增加更难以检测的幻觉。