Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
翻译:给定一系列可观测变量 $\{(x_1, y_1), \ldots, (x_n, y_n)\}$,共形预测方法在仅假设数据联合分布具有置换不变性的条件下,为给定 $x_{n+1}$ 时的 $y_{n+1}$ 估计一个对任意有限样本量均有效的置信集。尽管该方法颇具吸引力,但在大多数回归问题中,计算此类置信集在计算上并不可行。实际上,在这些情况下,未知变量 $y_{n+1}$ 可取无限多个可能的候选值,而生成共形集需要针对每个候选值重新训练预测模型。本文聚焦于仅使用部分变量进行预测的稀疏线性模型,并利用数值延拓技术高效逼近解路径。我们所利用的关键性质是:所选变量集合在输入数据的微小扰动下保持不变。因此,只需在活跃特征集的变化点处枚举并重新拟合模型,其余部分的解则通过预测-校正机制进行平滑插值。我们展示了路径追踪算法如何精确逼近共形预测集,并通过合成数据与真实数据示例说明其性能。