Hallucinations, defined as instances where Large Language Models (LLMs) generate false or misleading content, pose a significant challenge that impacts the safety and trust of downstream applications. We introduce UQLM, a Python package for LLM hallucination detection using state-of-the-art uncertainty quantification (UQ) techniques. This toolkit offers a suite of UQ-based scorers that compute response-level confidence scores ranging from 0 to 1. This library provides an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.
翻译:幻觉(定义为大型语言模型生成虚假或误导性内容的情况)是影响下游应用安全性与可信度的重大挑战。本文介绍UQLM——一个基于前沿不确定性量化技术的大型语言模型幻觉检测Python工具包。该工具集提供基于不确定性量化的评分器套件,可计算0到1范围内的响应级置信度分数。本库为基于不确定性量化的幻觉检测提供开箱即用的解决方案,能够轻松集成以提升大型语言模型输出的可靠性。