Large language models (LLMs) have become pervasive in our everyday life. Yet, a fundamental obstacle prevents their use in many critical applications: their propensity to generate fluent, human-quality content that is not grounded in reality. The detection of such hallucinations is thus of the highest importance. In this work, we propose a new method to flag hallucinated content: MMD-Flagger. It relies on Maximum Mean Discrepancy (MMD), a non-parametric distance between distributions. On a high-level perspective, MMD-Flagger tracks the MMD between the output to inspect and counterparts generated with various temperature parameters. We show empirically that inspecting the shape of this trajectory is sufficient to detect most hallucinations. This novel method is benchmarked on machine translation and summarization datasets, on which it exhibits competitive performance relative to natural competitors.
翻译:大型语言模型(LLMs)已广泛渗透到我们的日常生活中。然而,一个根本性障碍阻碍了其在许多关键应用中的使用:它们倾向于生成流畅、具有人类水平质量但缺乏现实依据的内容。因此,检测此类幻觉至关重要。在本研究中,我们提出了一种标记幻觉内容的新方法:MMD-Flagger。该方法基于最大均值差异(MMD),这是一种分布间的非参数距离度量。从高层次视角看,MMD-Flagger通过追踪待检测输出与使用不同温度参数生成的对应文本之间的MMD轨迹。我们通过实验证明,仅分析该轨迹的形态就足以检测大多数幻觉现象。这一新方法在机器翻译和文本摘要数据集上进行了基准测试,结果显示其相对于自然竞争方法具有可比性的性能表现。