SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

As large language models (LLMs) are increasingly deployed in risk-sensitive applications such as real-world open-ended question answering (QA), ensuring the trustworthiness of their outputs has become critical. Existing selective conformal prediction (SCP) methods provide statistical guarantees by constructing prediction sets with a constrained miscoverage rate for correct answers. However, prior works unrealistically assume that admissible answers for all instances can be obtained via finite sampling, even for open-ended QA scenarios that lack a fixed and finite solution space. To address this, we introduce a two-stage risk control framework comprising abstention-aware sampling and conformalized filtering (SAFER). Firstly, on a held-out calibration set, SAFER calibrates a sampling budget within the maximum sampling cap, using the Clopper-Pearson exact method at a user-desired risk level (i.e., the maximum allowable miscoverage rate of the sampling sets). If the risk level cannot be satisfied within the cap, we abstain; otherwise, the calibrated sampling budget becomes the minimum requirements at test time. Then, we employ calibration instances where correct answers are attainable under the calibrated budget and apply the conformal risk control method to determine a statistically valid uncertainty threshold, which filters unreliable distractors from the candidate set for each test data point. In this stage, SAFER introduces an additional risk level to guide the calculation of the threshold, thereby controlling the risk of correct answers being excluded. Furthermore, we show that SAFER is compatible with various task-specific admission criteria and calibration-test split ratios, highlighting its robustness and high data efficiency.

翻译：随着大语言模型（LLMs）在现实世界开放域问答（QA）等风险敏感应用中的部署日益增多，确保其输出的可信性变得至关重要。现有的选择性共形预测（SCP）方法通过构建具有受限错误覆盖率的预测集来提供统计保证。然而，先前研究不切实际地假设所有实例的可接受答案均可通过有限采样获得，即便对于缺乏固定有限解空间的开放域问答场景也是如此。为解决这一问题，我们提出了一个包含弃权感知采样与共形化过滤（SAFER）的两阶段风险控制框架。首先，在预留的校准集上，SAFER采用Clopper-Pearson精确方法，在用户期望的风险水平（即采样集的最大允许错误覆盖率）下，于最大采样上限内校准采样预算。若风险水平无法在采样上限内满足，则执行弃权；否则，校准后的采样预算将成为测试时的最低要求。随后，我们选取在校准预算内可获得正确答案的校准实例，并应用共形风险控制方法确定统计有效的置信度阈值，该阈值可从每个测试数据点的候选集中过滤不可靠的干扰项。在此阶段，SAFER引入额外的风险水平来指导阈值计算，从而控制正确答案被排除的风险。此外，我们证明SAFER兼容多种任务特定的准入标准和校准-测试分割比例，体现了其鲁棒性和高数据效率。