We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on biography and medical question-answering datasets.
翻译:我们开发了新的共形推理方法,用于获取大语言模型(LLMs)输出的有效性保证。先前在共形语言建模领域的研究通过识别文本中满足高概率正确性保证的子集来实现这一目标。这些方法的工作原理是:若对声明进行评分函数评估后未能超过通过分拆共形预测校准的阈值,则从LLM原始响应中过滤掉该声明。该领域的现有方法存在两个缺陷。首先,所陈述的保证并非条件有效的。过滤步骤的可信度可能随响应主题的不同而变化。其次,由于评分函数不完善,过滤步骤可能会移除许多有价值且准确的声明。我们通过两种新的共形方法应对这两项挑战。首先,我们推广了Gibbs等人(2023)的条件共形程序,以便在需要保持输出效用时自适应地发布较弱的保证。其次,我们提出一种通过条件共形程序进行微分的新颖算法,系统性地提升评分函数的质量。我们在传记和医学问答数据集上验证了所提方法的有效性。