To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy -- the degree of uniformity across normalized text embeddings on the unit sphere -- and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy -- that is, greater embedding dispersion -- reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing approaches in predicting nonfactuality in long-form responses using only a handful of samples -- offering a practical, low-cost approach for integrating trust assessment into real-world LLM workflows.
翻译:为了将大型语言模型(LLMs)部署在需要针对开放式提示提供实质性准确回应的高风险应用领域,我们需要可靠且计算成本低廉的方法来评估LLMs生成长文本回应的可信度。然而,现有方法通常依赖于逐项声明的事实核查,这在处理开放式提示的长文本回应时计算成本高昂且脆弱。本研究引入语义各向同性——即单位球面上归一化文本嵌入的均匀程度——并利用其评估LLMs生成长文本回应的可信度。具体而言,我们生成若干长文本回应,将其嵌入向量化,并通过单位球面上嵌入向量的角分散度估计这些回应的语义各向同性水平。我们发现更高的语义各向同性(即更大的嵌入分散度)能够可靠地指示样本间较低的事实一致性。该方法无需标注数据、微调或超参数选择,且适用于开源或闭源权重的嵌入模型。在多个领域中,本方法仅需少量样本即可在预测长文本回应的非事实性方面持续优于现有方法,为将信任评估整合到实际LLM工作流程提供了一种实用、低成本的解决方案。