Conformal Tradeoffs: Guarantees Beyond Coverage

Deployed conformal predictors are long-lived decision infrastructure reused over finite operational windows. In practice, stakeholders care not only about marginal coverage, but also about operational quantities: how often the system commits versus defers, and what error exposure it induces when it acts. These deployment-facing quantities are not determined by coverage alone: identical calibrated thresholds can yield markedly different operational profiles depending on score geometry. We develop tools for operational certification and planning beyond coverage for split conformal prediction. First, Small-Sample Beta Correction (SSBC) inverts the exact finite-sample rank/Beta law to map a user request $(α^\star,δ)$ to a concrete calibration grid point with PAC-style semantics, yielding explicit finite-window coverage guarantees for a reused deployed rule. Second, because no distribution-free pivot exists beyond coverage, we propose Calibrate-and-Audit: an independent audit set supports certified finite-window predictive envelopes (Binomial/Beta-Binomial) for key operational quantities -- commitment frequency, deferral, and decisive error exposure -- and related metrics via linear projection, without committing to a scalar objective. Third, we give a geometric characterization of the feasibility constraints and regime boundaries induced by a fixed conformal partition, clarifying why operational quantities are coupled and how calibration navigation trades them off. The result is an operational menu rthat traces attainable operational profiles (Pareto trade-offs) and attach finite-window uncertainty envelopes to each regime. We illustrate the approach on benchmark molecular toxicity and aqueous solubility datasets.

翻译：已部署的保形预测器是长期运行的决策基础设施，在有限的操作窗口内重复使用。实践中，利益相关者不仅关注边际覆盖率，还关注操作层面的量化指标：系统执行决策与推迟决策的频率，以及其在行动时引发的错误暴露程度。这些面向部署的指标并非仅由覆盖率决定：相同的校准阈值可能因评分几何结构的不同而产生显著差异的操作特征。我们为分割保形预测开发了超越覆盖率的操作认证与规划工具。首先，小样本贝塔校正（SSBC）通过反转精确的有限样本秩/贝塔分布律，将用户请求$(α^\star,δ)$映射到具有PAC风格语义的具体校准网格点，为重复使用的部署规则提供明确的有限窗口覆盖率保证。其次，由于在覆盖率之外不存在无分布枢轴量，我们提出校准-审计方法：通过独立审计集支持对关键操作指标——决策频率、推迟率和决定性错误暴露——及其通过线性投影得到的相关度量，构建经认证的有限窗口预测包络（二项/贝塔-二项分布），而无需承诺单一标量目标。第三，我们给出由固定保形划分引发的可行性约束与机制边界的几何刻画，阐明操作指标为何相互耦合以及校准导航如何进行权衡。最终形成操作菜单，可描绘可达的操作特征（帕累托权衡曲线）并为每个机制附加有限窗口不确定性包络。我们在分子毒性和水溶性基准数据集上展示了该方法的应用。