Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, meaning it only holds on average over the entire population but not necessarily for any specific subgroup. This article introduces posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional validity for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional nonconformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional validity, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it further ensures coverage for underrepresented individuals in each subgroup. When the response variable is categorical, PCP can adjust the coverage level based on the classifier's predictive probabilities, yielding low-cardinality prediction sets if the classifier is well calibrated. We demonstrate enhanced performance on datasets from socioeconomics, materials science, and healthcare.
翻译:共形预测是一种构建具有分布无关性覆盖保证的预测区间的流行技术。其覆盖是边缘性的,即仅在整个总体平均意义上成立,而不一定适用于任何特定子群体。本文引入后验共形预测方法,该方法能够为数据中自然发现的聚类(或子群体)生成同时具备边缘有效性和近似条件有效性的预测区间。PCP通过将条件非一致性得分分布建模为聚类分布的混合来实现这些保证。与其他具有近似条件有效性的方法相比,当测试数据来自验证数据中代表性良好的聚类时,该方法能产生更紧凑的区间。PCP还可用于保证用户指定子群体的条件覆盖,此时能进一步确保每个子群体中代表性不足个体的覆盖。当响应变量为分类变量时,PCP可基于分类器的预测概率调整覆盖水平,若分类器校准良好,则可生成低基数预测集。我们在社会经济、材料科学和医疗保健领域的数据集上验证了其增强性能。