Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and efficiency in transductive methods, where efficiency is measured by the size of the prediction sets. Specifically, we derive a strict finite-sample bound showing that any non-trivial confidence level leads to exponential growth in prediction set size for data with inherent uncertainty. The exponent scales linearly with the number of samples and is proportional to the conditional entropy of the data. Additionally, the bound includes a second-order term, dispersion, defined as the variance of the log conditional probability distribution. We show that the transductive methods based on the approximate conditional distribution can approach this bound. Inspired by this setup, we introduce a practical transductive prediction algorithm that surpasses Bonferroni methods.
翻译:传递式共形预测处理多个数据点的同时预测问题。在给定期望置信水平的情况下,目标是构建一个预测集,使其以规定置信度包含真实结果。我们证明了传递式方法中置信度与效率之间存在基本权衡,其中效率通过预测集的大小来衡量。具体而言,我们推导出一个严格的有限样本界限,表明对于具有内在不确定性的数据,任何非平凡的置信水平都会导致预测集大小呈指数增长。该指数的尺度与样本数量线性相关,并与数据的条件熵成正比。此外,该界限包含一个二阶项——离散度,定义为对数条件概率分布的方差。我们证明,基于近似条件分布的传递式方法可以逼近此界限。受此启发,我们提出了一种实用的传递式预测算法,其性能优于Bonferroni方法。