Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and in the face of rapid LLM advancements. In this study, we introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses. Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors and existing synthetic generation methods in both accuracy and latency, indicating efficacy of our approach.
翻译:检测大型语言模型(LLM)输出中的幻觉至关重要,然而针对此分类任务的传统微调方法受限于昂贵且快速过时的标注过程,尤其是在众多垂直领域以及面对LLM快速发展的背景下。本研究提出一种通过重写系统响应自动生成忠实输出与幻觉输出的方法。实验结果表明,基于我们生成的数据集进行微调的T5-base模型,在准确性与延迟方面均超越了最先进的零样本检测器及现有合成生成方法,证明了本方法的有效性。