We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. This paper bridges this gap by defining a spectrum of problems that interpolate between marginal and conditional validity. We motivate these problems by reformulating conditional coverage as coverage over a class of covariate shifts. When the target class of shifts is finite dimensional, we show how to simultaneously obtain exact finite sample coverage over all possible shifts. For example, given a collection of protected subgroups, our algorithm outputs intervals with exact coverage over each group. For more flexible, infinite dimensional classes where exact coverage is impossible, we provide a simple procedure for quantifying the gap between the coverage of our algorithm and the target level. Moreover, by tuning a single hyperparameter, we allow the practitioner to control the size of this gap across shifts of interest. Our methods can be easily incorporated into existing split conformal inference pipelines, and thus can be used to quantify the uncertainty of modern black-box algorithms without distributional assumptions.
翻译:摘要:本文研究在有限样本下构建无分布假设且具备条件保证的预测集问题。已有研究表明,在有限样本中普遍实现精确的条件覆盖是不可能的。因此,大多数主流方法仅能提供关于协变量的边际覆盖。本文通过定义一系列介于边际有效性与条件有效性之间的问题来弥合这一差距。我们通过将条件覆盖重新表述为某类协变量偏移下的覆盖来推导这些问题。当目标偏移类为有限维时,我们展示了如何同时获得所有可能偏移下的精确有限样本覆盖。例如,给定一组受保护子群体,我们的算法能输出对每个群体均具有精确覆盖的区间。对于更灵活但无法实现精确覆盖的无限维类,我们提出了一种简单程序来量化算法覆盖与目标水平之间的差距。此外,通过调节单个超参数,使用者可以控制该差距在感兴趣偏移中的大小。我们的方法可轻松融入现有分裂一致性推断流程,从而能在无分布假设条件下量化现代黑箱算法的不确定性。