Many risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables, for which the prediction algorithm may report correlated non-conformity scores. In this work, we treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure. Drawing from the rich literature on multivariate quantiles and semiparametric statistics, we propose an algorithm to estimate the $1-\alpha$ quantile of the scores, where $\alpha$ is the user-specified miscoverage rate. In particular, we flexibly estimate the joint cumulative distribution function (CDF) of the scores using nonparametric vine copulas and improve the asymptotic efficiency of the quantile estimate using its influence function. The vine decomposition allows our method to scale well to a large number of targets. We report desired coverage and competitive efficiency on a range of real-world regression problems, including those with missing-at-random labels in the calibration set.
翻译:在许多风险敏感的应用中,需要对多个可能存在相关性的目标变量构建校准良好的预测集,而预测算法可能输出相关的非保形分数。在本工作中,我们将这些分数视为随机向量,旨在构建能够考虑其联合相关结构的预测集。借鉴多元分位数与半参数统计的丰富文献,我们提出了一种算法来估计分数的$1-\alpha$分位数,其中$\alpha$为用户指定的误覆盖率。具体而言,我们采用非参数藤Copula灵活地估计分数的联合累积分布函数,并利用影响函数提升分位数估计的渐近效率。藤分解使我们的方法能够良好地扩展至大量目标变量。我们在多个真实世界回归问题(包括校准集中存在随机缺失标签的情形)上报告了期望的覆盖率和具有竞争力的效率。