Large language models (LLMs) have become pervasive in our everyday life. Yet, a fundamental obstacle prevents their use in many critical applications: their propensity to generate fluent, human-quality content that is not grounded in reality. The detection of such hallucinations is thus of the highest importance. In this work, we propose a new method to flag hallucinated content: MMD-Flagger. It relies on Maximum Mean Discrepancy (MMD), a non-parametric distance between distributions. On a high-level perspective, MMD-Flagger tracks the MMD between the output to inspect and counterparts generated with various temperature parameters. We show empirically that inspecting the shape of this trajectory is sufficient to detect most hallucinations. This novel method is benchmarked on machine translation and summarization datasets, on which it exhibits competitive performance relative to natural competitors.
翻译:大型语言模型(LLMs)已广泛应用于我们的日常生活。然而,一个根本性障碍阻碍了其在诸多关键应用中的使用:它们倾向于生成流畅、符合人类表达习惯但脱离现实的内容。因此,对此类幻觉的检测至关重要。本研究提出一种标记幻觉内容的新方法:MMD-Flagger。该方法基于最大平均差异(MMD)——一种分布间的非参数距离度量。从高层次视角看,MMD-Flagger通过追踪待检测输出与使用不同温度参数生成的对应文本之间的MMD值来实现检测。我们通过实验证明,分析该轨迹的形态足以检测大多数幻觉现象。这一新方法在机器翻译和文本摘要数据集上进行了基准测试,结果显示其相较于同类自然方法具有竞争优势。