It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference $p$-values when the predictive model can be described as a linear function of $y$. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.
翻译:在机器学习中,根据协变量信息$x$估计响应$y$是常见的做法。然而,这些预测本身并不能量化与所述预测相关的任何不确定性。克服这一缺陷的一种方法是使用共形推断方法,该方法能以规定的概率构建一个包含未观测响应$y$的集合。遗憾的是,即使是一维响应,尽管近期取得了令人鼓舞的进展,共形推断的计算成本仍然很高。本文探索多输出回归问题,当预测模型可描述为$y$的线性函数时,我们给出了共形推断$p$值的精确推导。此外,我们提出了\texttt{unionCP}以及\texttt{rootCP}的多变量扩展,作为在保留计算优势的同时,为广泛的多输出预测器(包括线性和非线性)近似共形预测区域的有效方法。我们还通过真实世界数据和模拟数据,为这些方法的有效性提供了理论和实证证据。