Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $α$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $\mathcal{O}(1/\sqrt{n} + 1/(α^2 n) + 1/\sqrt{m} + \exp(-α^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $α$. The results identify phase transitions in convergence rates across different regimes of $α$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.
翻译:共形预测能够提供具有覆盖保证的预测集。其信息量取决于效率,通常以预测集的期望大小来量化。先前关于共形化回归效率的研究通常将误覆盖水平 $α$ 视为固定常数。在本工作中,我们在对数据分布做出温和假设的前提下,为通过SGD训练的共形化分位数回归和中位数回归,建立了预测集长度偏离理想区间长度的非渐近界。我们得到的阶为 $\mathcal{O}(1/\sqrt{n} + 1/(α^2 n) + 1/\sqrt{m} + \exp(-α^2 m))$ 的界,捕捉了效率对主训练集大小 $n$、校准集大小 $m$ 以及误覆盖水平 $α$ 的联合依赖关系。这些结果揭示了在不同 $α$ 区间内收敛速率的相变,为分配数据以控制预测集长度超量提供了指导。实证结果与我们的理论发现一致。