Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the issue of output hallucination. However, existing self-detection approaches only retrospectively evaluate answers generated by LLM, typically leading to the over-trust in incorrectly generated answers. To tackle this limitation, we propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers. It thoroughly compares the trustworthiness of multiple candidate answers to mitigate the over-trust in LLM-generated incorrect answers. Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer, and then aggregates the justifications for comprehensive target answer evaluation. This framework can be seamlessly integrated with existing approaches for superior self-detection. Extensive experiments on six datasets spanning three tasks demonstrate the effectiveness of the proposed framework.
翻译:大语言模型的自检测旨在利用模型自身能力评估其输出的可信度,从而缓解输出幻觉问题。然而,现有自检测方法仅对模型生成的答案进行回溯性评估,通常导致对错误生成答案的过度信任。为克服这一局限,本文提出一种新颖的自检测范式,该范式考虑超越模型生成答案的完整答案空间,通过系统比较多个候选答案的可信度来缓解对模型生成错误答案的过度信任。基于此范式,我们构建了一个两步框架:首先指导模型对每个候选答案进行反思并提供依据,随后聚合所有依据以进行全面的目标答案评估。该框架可与现有方法无缝集成以实现更优的自检测性能。在涵盖三类任务的六个数据集上的大量实验验证了所提框架的有效性。