While split conformal prediction guarantees marginal coverage, approaching the stronger property of conditional coverage is essential for reliable uncertainty quantification. Naive conformal scores, however, suffer from poor conditional coverage in heteroskedastic settings. In univariate regression, this is commonly addressed by normalizing non-conformity scores using an estimated local score variance. In this work, we propose a natural extension of this normalization to the multivariate setting, effectively whitening the residuals to decouple output correlations and standardize local variance. Furthermore, we derive a sufficient condition characterizing a broad class of distributions for which standardized residuals yield asymptotic conditional coverage. We demonstrate that using the Mahalanobis distance induced by a learned local covariance as a non-conformity score provides a closed-form, computationally efficient mechanism for capturing inter-output correlations and heteroskedasticity, avoiding the expensive sampling required by previous methods based on cumulative distribution functions. This structure unlocks several practical extensions, including the handling of missing output values, the refinement of conformal sets when partial information is revealed, and the construction of valid conformal sets for transformations of the output. Finally, we provide extensive empirical evidence on both synthetic and real-world datasets showing that our approach yields conformal sets that improve upon the conditional coverage of existing multivariate baselines.
翻译:虽然分裂共形预测保证了边际覆盖,但实现更强的条件覆盖特性对于可靠的量化不确定性至关重要。然而,在异方差场景下,朴素共形评分在条件覆盖方面表现不佳。在单变量回归中,通常通过使用估计的局部得分方差对非一致性得分进行归一化来解决此问题。本研究提出将该归一化方法自然地扩展到多变量场景,通过白化残差来解耦输出相关性并标准化局部方差。此外,我们推导出充分条件,刻画了一类广泛分布,在这些分布下标准化残差可实现渐近条件覆盖。我们证明,使用由学习到的局部协方差诱导的马氏距离作为非一致性得分,提供了一种闭式、计算高效的机制来捕捉输出间相关性和异方差性,避免了过去基于累积分布函数方法所需的高昂采样成本。这种结构解锁了多种实际扩展,包括处理缺失输出值、在部分信息揭示时细化共形集,以及为输出变换构建有效的共形集。最后,我们在合成和真实数据集上提供了大量经验证据,表明我们的方法生成的共形集在条件覆盖方面优于现有多变量基线方法。