Deployed conformal predictors are long-lived decision infrastructure operating over finite operational windows. The real-world question is not only ``Does the true label lie in the prediction set at the target rate?'' (marginal coverage), but ``How often does the system commit versus defer? What error exposure does it induce when it acts? How do these rates trade off?'' Marginal coverage does not determine these deployment-facing quantities: the same calibrated thresholds can yield different operational profiles depending on score geometry. We provide a framework for operational certification and planning beyond coverage with three contributions. (1) Small-Sample Beta Correction (SSBC): we invert the exact finite-sample Beta/rank law for split conformal to map a user request $(α^\star,δ)$ to a calibrated grid point with PAC-style semantics, yielding explicit finite-window coverage guarantees. (2) Calibrate-and-Audit: since no distribution-free pivot exists for rates beyond coverage, we introduce a two-stage design in which an independent audit set produces a reusable region -- label table and certified finite-window envelopes (Binomial/Beta-Binomial) for operational quantities -- commitment frequency, deferral, decisive error exposure, and commit purity -- via linear projection. (3) Geometric characterization: we describe feasibility constraints, regime boundaries (hedging vs.\ rejection), and cost-coherence conditions induced by a fixed conformal partition, explaining why operational rates are coupled and how calibration navigates their trade-offs. The output is an auditable operational menu: for a fixed scoring model, we trace attainable operational profiles across calibration settings and attach finite-window uncertainty envelopes. We demonstrate the approach on Tox21 toxicity prediction (12 endpoints) and aqueous solubility screening using AquaSolDB.
翻译:已部署的保形预测器是在有限操作窗口内长期运行的决策基础设施。现实世界中的问题不仅是"真实标签是否以目标频率出现在预测集中?"(边际覆盖),还包括"系统执行决策与推迟决策的频率如何?它在执行决策时会引发多少错误风险?这些频率之间如何权衡?"边际覆盖并不能决定这些面向部署的指标:相同的校准阈值可能因评分几何结构的不同而产生不同的操作特征。我们提出了一个超越覆盖的操作认证与规划框架,包含三项贡献。(1)小样本贝塔校正(SSBC):我们反转分割保形中精确有限样本的贝塔/秩定律,将用户请求$(α^\star,δ)$映射到具有PAC风格语义的校准网格点,从而提供明确的有限窗口覆盖保证。(2)校准-审计双阶段设计:由于不存在超越覆盖率的无分布枢轴量,我们引入两阶段设计,其中独立审计集生成可复用的区域-标签表,并通过线性投影为操作指标——决策频率、推迟率、确定性错误暴露和决策纯度——提供经认证的有限窗口包络(二项/贝塔二项分布)。(3)几何特征刻画:我们描述了由固定保形划分引发的可行性约束、机制边界(对冲与拒绝)以及成本一致性条件,解释了操作频率为何相互耦合,以及校准如何引导其权衡。最终输出是可审计的操作菜单:针对固定评分模型,我们追踪不同校准设置下可达到的操作特征,并附加有限窗口不确定性包络。我们在Tox21毒性预测(12个端点)和使用AquaSolDB的水溶性筛选任务中验证了该方法。