While conformal predictors reap the benefits of rigorous statistical guarantees for their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of finite-sample analysis and guarantees for their prediction set sizes. To address this shortfall, we theoretically quantify the expected size of the prediction set under the split conformal prediction framework. As this precise formulation cannot usually be calculated directly, we further derive point estimates and high probability intervals that can be easily computed, providing a practical method for characterizing the expected prediction set size across different possible realizations of the test and calibration data. Additionally, we corroborate the efficacy of our results with experiments on real-world datasets, for both regression and classification problems.
翻译:尽管共形预测器在错误频率方面享有严格的统计保证,但其对应预测集的大小对其实际效用至关重要。遗憾的是,目前缺乏关于预测集大小的有限样本分析与保证。为弥补这一不足,我们在分割共形预测框架下从理论上量化了预测集的期望大小。由于通常无法直接计算该精确表达,我们进一步推导了易于计算的点估计和高概率区间,从而提供了一种实用方法,用于刻画测试数据和校准数据在不同可能实现下预测集的期望大小。此外,我们通过在真实世界数据集上的实验(涵盖回归与分类问题)验证了所得结果的有效性。