Despite the recent advancement in large language models (LLMs) and their high performances across numerous benchmarks, recent research has unveiled that LLMs suffer from hallucinations and unfaithful reasoning. This work studies a specific type of hallucination induced by semantic associations. Specifically, we investigate to what extent LLMs take shortcuts from certain keyword/entity biases in the prompt instead of following the correct reasoning path. To quantify this phenomenon, we propose a novel probing method and benchmark called EureQA. We start from questions that LLMs will answer correctly with utmost certainty, and mask the important entity with evidence sentence recursively, asking models to find masked entities according to a chain of evidence before answering the question. During the construction of the evidence, we purposefully replace semantic clues (entities) that may lead to the correct answer with distractor clues (evidence) that will not directly lead to the correct answer but require a chain-like reasoning process. We evaluate if models can follow the correct reasoning chain instead of short-cutting through distractor clues. We find that existing LLMs lack the necessary capabilities to follow correct reasoning paths and resist the attempt of greedy shortcuts. We show that the distractor semantic associations often lead to model hallucination, which is strong evidence that questions the validity of current LLM reasoning.
翻译:尽管大型语言模型(LLMs)近期取得进展并在众多基准测试中表现优异,但最新研究表明LLMs存在幻觉和不可靠推理问题。本文研究由语义关联引发的一类特定幻觉,具体探讨LLMs在多大程度上会因提示词中的特定关键词/实体偏差而走捷径,而非遵循正确的推理路径。为量化这一现象,我们提出了名为EureQA的新型探测方法与基准测试。我们从LLMs能高度确信正确回答的问题出发,通过递归方式用证据句掩盖关键实体,要求模型在回答问题前依据证据链找出被掩盖实体。在证据构建过程中,我们刻意将可能导致正确答案的语义线索(实体)替换为不会直接导向正确答案、但需链式推理过程的干扰线索(证据)。我们评估模型能否遵循正确推理链而非通过干扰线索走捷径。研究发现,现有LLMs缺乏遵循正确推理路径的能力,且难以抵抗贪婪捷径的尝试。研究表明,干扰性语义关联常导致模型幻觉,这一强有力证据对当前LLM推理的有效性提出了质疑。