Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics.
翻译:大型语言模型在诸多领域展现出令人瞩目的能力,推动了其实际应用的激增。然而,LLM输出的可信度问题日益引发关注,尤其在闭卷问答任务中,由于缺乏上下文或真实信息,非专业用户可能难以识别不准确之处。本文提出TrustScore——一个基于行为一致性概念的评估框架,旨在检验LLM响应是否与其内在知识相符。此外,TrustScore可与事实核查方法无缝集成,从而评估响应与外部知识源的一致性。实验结果表明,TrustScore与人工判断具有强相关性,在超越现有无参考指标的同时,达到了与有参考指标相当的性能。