We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. This paper bridges this gap by defining a spectrum of problems that interpolate between marginal and conditional validity. We motivate these problems by reformulating conditional coverage as coverage over a class of covariate shifts. When the target class of shifts is finite dimensional, we show how to simultaneously obtain exact finite sample coverage over all possible shifts. For example, given a collection of protected subgroups, our algorithm outputs intervals with exact coverage over each group. For more flexible, infinite dimensional classes where exact coverage is impossible, we provide a simple procedure for quantifying the gap between the coverage of our algorithm and the target level. Moreover, by tuning a single hyperparameter, we allow the practitioner to control the size of this gap across shifts of interest. Our methods can be easily incorporated into existing split conformal inference pipelines, and thus can be used to quantify the uncertainty of modern black-box algorithms without distributional assumptions.
翻译:我们研究了在有限样本下构造无分布假设的预测集并实现条件保证的问题。已有研究表明,在有限样本下普遍实现精确的条件覆盖是不可能的。因此,大多数流行方法仅能提供关于协变量的边际覆盖。本文通过定义一系列介于边际有效性与条件有效性之间的问题来弥合这一差距。我们通过将条件覆盖重新表述为关于一类协变量偏移的覆盖来推动这些问题的研究。当目标偏移类为有限维时,我们展示了如何同时获得所有可能偏移下的精确有限样本覆盖。例如,给定一组受保护的子群体,我们的算法能够输出每个群体具有精确覆盖的区间。对于更灵活、无限维且无法实现精确覆盖的类别,我们提供了一种简单程序来量化算法覆盖与目标水平之间的差距。此外,通过调整单一超参数,我们允许实践者控制感兴趣的偏移之间该差距的大小。我们的方法可轻松集成到现有分割共形推断流程中,从而无需分布假设即可量化现代黑箱算法的不确定性。