Foundation models often generate unreliable answers, while heuristic uncertainty estimators fail to fully distinguish correct from incorrect outputs, causing users to accept erroneous answers without statistical guarantees. We address this through the lens of false discovery rate (FDR) control, ensuring that among all accepted predictions, the proportion of errors does not exceed a target risk level. To this end, we propose LEC, a principled framework that reframes selective prediction as a decision problem governed by a linear expectation constraint over selection and error indicators. Under this formulation, we derive a finite-sample sufficient condition that relies only on a held-out set of exchangeable calibration data, enabling the computation of an FDR-constrained, retention-maximizing threshold. Furthermore, we extend LEC to two-model routing systems: if the primary model's uncertainty exceeds its calibrated threshold, the input is delegated to a subsequent model, while maintaining system-level FDR control. Experiments on both closed-ended and open-ended question answering (QA) and vision question answering (VQA) demonstrate that LEC achieves tighter FDR control and substantially improves sample retention compared to prior approaches.
翻译:基础模型常产生不可靠答案,而启发式不确定性估计器无法完全区分正确与错误输出,导致用户在缺乏统计保证的情况下接受错误答案。本文通过错误发现率控制视角解决该问题,确保在所有被接受的预测中,错误比例不超过目标风险水平。为此,我们提出LEC——一个将预测选择重构为受选择指标与错误指标线性期望约束决策问题的理论框架。在此形式化框架下,我们推导出仅依赖于可交换校准数据集的有限样本充分条件,从而能够计算满足FDR约束且保留率最大化的阈值。进一步地,我们将LEC扩展至双模型路由系统:若主模型的不确定性超过其校准阈值,则将输入委托给后续模型处理,同时维持系统级的FDR控制。在封闭式/开放式问答及视觉问答任务上的实验表明,相较于现有方法,LEC能实现更严格的FDR控制,并显著提升样本保留率。