The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of "hallucination," which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques for hallucination detection in language assistants rely on intricate fuzzy, specific free-language-based chain of thought (CoT) techniques or parameter-based methods that suffer from interpretability issues. Additionally, the methods that identify hallucinations post-generation could not prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as {\method}, which focuses on evaluating the model's familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts. This approach emulates the human ability to refrain from responding to unfamiliar topics, thus reducing hallucinations. We validate {\method} across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.
翻译:大型语言模型在不同领域的广泛使用引发了对其“幻觉”问题的关注,即模型生成事实不准确或无根据信息的情况。现有语言助手的幻觉检测技术依赖于复杂的模糊语言推理链(CoT)特定自由语言技术,或存在可解释性问题的参数化方法。此外,在生成后识别幻觉的方法不仅无法阻止幻觉发生,还因指令格式和模型风格的影响而出现性能不一致问题。本文提出一种名为{\method}的新型预检测自评估技术,其核心在于评估模型对输入指令中概念的熟悉程度,并在遇到不熟悉概念时暂停生成响应。该方法模仿人类面对不熟悉话题时拒绝回应的能力,从而减少幻觉。我们通过四种不同大型语言模型验证{\method},结果显示其性能始终优于现有技术。研究结果表明,优先采用预防策略可显著提升LLM助手的可靠性、适用性和可解释性,为缓解幻觉问题提供重要方向。