Hallucinations are one of the major issues affecting LLMs, hindering their wide adoption in production systems. While current research solutions for detecting hallucinations are mainly based on heuristics, in this paper we introduce a mathematically sound methodology to reason about hallucination, and leverage it to build a tool to detect hallucinations. To the best of our knowledge, we are the first to show that hallucinated content has structural differences with respect to correct content. To prove this result, we resort to the Minkowski distances in the embedding space. Our findings demonstrate statistically significant differences in the embedding distance distributions, that are also scale free -- they qualitatively hold regardless of the distance norm used and the number of keywords, questions, or responses. We leverage these structural differences to develop a tool to detect hallucinated responses, achieving an accuracy of 66\% for a specific configuration of system parameters -- comparable with the best results in the field. In conclusion, the suggested methodology is promising and novel, possibly paving the way for further research in the domain, also along the directions highlighted in our future work.
翻译:幻觉是影响大语言模型(LLM)的主要问题之一,阻碍了其在生产系统中的广泛应用。当前检测幻觉的研究方案主要基于启发式方法,而本文提出了一种数学上严谨的方法来推理幻觉问题,并利用该方法构建了一个检测幻觉的工具。据我们所知,我们首次证明了幻觉内容在结构上与正确内容存在差异。为验证这一结论,我们采用了嵌入空间中的闵可夫斯基距离进行度量。研究结果表明,嵌入距离分布在统计上存在显著差异,且这种差异具有尺度不变性——无论使用何种距离范数,也无论关键词、问题或回答的数量如何,其定性结论均成立。我们利用这些结构差异开发了一个检测幻觉回答的工具,在特定系统参数配置下达到了66%的准确率——该结果与领域内最佳水平相当。综上所述,本研究所提出的方法具有创新性和应用前景,可能为该领域的进一步研究开辟新路径,未来工作方向亦在文中予以展望。