This study aims to address the pervasive challenge of quantifying uncertainty in large language models (LLMs) without logit-access. Conformal Prediction (CP), known for its model-agnostic and distribution-free features, is a desired approach for various LLMs and data distributions. However, existing CP methods for LLMs typically assume access to the logits, which are unavailable for some API-only LLMs. In addition, logits are known to be miscalibrated, potentially leading to degraded CP performance. To tackle these challenges, we introduce a novel CP method that (1) is tailored for API-only LLMs without logit-access; (2) minimizes the size of prediction sets; and (3) ensures a statistical guarantee of the user-defined coverage. The core idea of this approach is to formulate nonconformity measures using both coarse-grained (i.e., sample frequency) and fine-grained uncertainty notions (e.g., semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks show our approach can mostly outperform the logit-based CP baselines.
翻译:本研究旨在解决无对数几率访问下大语言模型(LLMs)的不确定性量化这一普遍挑战。共形预测(CP)因其模型无关性和分布自由特性,成为适用于各类LLM及数据分布的理想方法。然而,现有针对LLM的CP方法通常假设可获取对数几率,这对仅提供API的LLM并不可行。此外,已知对数几率存在校准偏差问题,可能导致CP性能下降。为应对这些挑战,我们提出一种新型CP方法: (1)针对性适配无对数几率访问的仅API型LLM;(2)最小化预测集规模;(3)确保用户定义覆盖率的统计保证。该方法的核心思想是利用粗粒度(如样本频率)与细粒度不确定性概念(如语义相似性)构建非一致性度量。在封闭式与开放式问答任务上的实验结果表明,本方法在多数情况下优于基于对数几率的CP基线方法。