Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
翻译:共形推断是一种基础且通用的工具,可为许多机器学习任务提供无分布假设的保证。我们考虑传导设置,在该设置中对包含 $m$ 个新数据点的测试样本做出决策,从而得到 $m$ 个共形 $p$ 值。虽然经典结果仅关注其边际分布,但我们证明其联合分布遵循波利亚罐模型,并为其经验分布函数建立了集中不等式。这些结果对任意可交换的得分均成立,包括可在训练阶段利用测试样本与校准样本的协变量以提高准确性的自适应得分。我们通过为当前关注的两种机器学习任务提供一致的概率化上界,证明了这些理论结果的实用性:用于传导迁移学习的区间预测以及基于二分类的新颖性检测。