Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that hallucinations stem from predictive uncertainty. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.
翻译:大型语言模型(LLM)在生成文本时可能出现幻觉现象。这些幻觉导致LLM不可靠,从而阻碍其在社会与工业领域的各类应用。当前LLM通过预测并追加文本标记以自回归方式生成文本。当LLM对后续待生成标记的语义含义不确定时,很可能开始产生幻觉。因此,已有研究指出幻觉源于预测不确定性。本文提出语义多样性语言生成(SDLG)方法以量化LLM的预测不确定性。SDLG引导LLM为初始生成文本生成语义多样且概率较高的替代文本。该方法通过提供精确的偶然性语义不确定性度量,能够检测初始文本是否可能包含幻觉。在问答任务上的实验表明,SDLG在保持最高计算效率的同时持续优于现有方法,为LLM不确定性估计设立了新标准。