Despite the recent advancement in large language models (LLMs) and their high performances across numerous benchmarks, recent research has unveiled that LLMs suffer from hallucinations and unfaithful reasoning. This work studies a specific type of hallucination induced by semantic associations. Specifically, we investigate to what extent LLMs take shortcuts from certain keyword/entity biases in the prompt instead of following the correct reasoning path. To quantify this phenomenon, we propose a novel probing method and benchmark called EureQA. We start from questions that LLMs will answer correctly with utmost certainty, and mask the important entity with evidence sentence recursively, asking models to find masked entities according to a chain of evidence before answering the question. During the construction of the evidence, we purposefully replace semantic clues (entities) that may lead to the correct answer with distractor clues (evidence) that will not directly lead to the correct answer but require a chain-like reasoning process. We evaluate if models can follow the correct reasoning chain instead of short-cutting through distractor clues. We find that existing LLMs lack the necessary capabilities to follow correct reasoning paths and resist the attempt of greedy shortcuts. We show that the distractor semantic associations often lead to model hallucination, which is strong evidence that questions the validity of current LLM reasoning.
翻译:尽管大型语言模型(LLMs)近年来取得显著进展,并在众多基准测试中表现出色,但最新研究揭示其存在幻觉与非忠实推理问题。本研究聚焦于一种由语义关联引发的特定幻觉类型,具体探究LLMs在多大程度上会因提示中的关键词/实体偏差而采取捷径,而非遵循正确的推理路径。为量化这一现象,我们提出一种新型探针方法与基准测试——EureQA。研究从LLMs能绝对确定正确回答的问题出发,递归地使用证据句遮蔽关键实体,要求模型在回答问题前根据证据链找出被遮蔽实体。在构建证据过程中,我们有意将可能指向正确答案的语义线索(实体)替换为不会直接导向正确答案、但需链式推理的干扰线索(证据)。我们评估模型能否遵循正确推理链而非通过干扰线索走捷径。研究发现,现有LLMs缺乏遵循正确推理路径、抵制贪婪捷径尝试的必要能力。干扰性语义关联常导致模型产生幻觉,这一发现有力地质疑了当前LLMs推理的有效性。