Quantifying the uncertainty of predictions is a core problem in modern statistics. Methods for predictive inference have been developed under a variety of assumptions, often -- for instance, in standard conformal prediction -- relying on the invariance of the distribution of the data under special groups of transformations such as permutation groups. Moreover, many existing methods for predictive inference aim to predict unobserved outcomes in sequences of feature-outcome observations. Meanwhile, there is interest in predictive inference under more general observation models (e.g., for partially observed features) and for data satisfying more general distributional symmetries (e.g., rotationally invariant or coordinate-independent observations in physics). Here we propose SymmPI, a methodology for predictive inference when data distributions have general group symmetries in arbitrary observation models. Our methods leverage the novel notion of distributional equivariant transformations, which process the data while preserving their distributional invariances. We show that SymmPI has valid coverage under distributional invariance and characterize its performance under distribution shift, recovering recent results as special cases. We apply SymmPI to predict unobserved values associated to vertices in a network, where the distribution is unchanged under relabelings that keep the network structure unchanged. In several simulations in a two-layer hierarchical model, and in an empirical data analysis example, SymmPI performs favorably compared to existing methods.
翻译:量化预测的不确定性是现代统计学的核心问题。在多种假设下,预测推断方法已被开发出来——例如,在标准共形预测中——通常依赖于数据分布在特殊变换群(如排列群)下的不变性。此外,许多现有预测推断方法旨在预测特征-结果观测序列中未观测到的结果。与此同时,对于在更一般的观测模型(例如部分观测特征)下以及满足更广泛分布对称性(例如物理学中旋转不变或坐标无关观测)的数据进行预测推断也引起了广泛关注。本文提出SymmPI,一种在任意观测模型中数据分布具有一般群对称性时进行预测推断的方法论。我们的方法利用了分布等变变换这一新颖概念,该变换在保持数据分布不变性的同时对其进行处理。我们证明SymmPI在分布不变性下具有有效的覆盖度,并刻画了其在分布偏移下的性能,将近期结果作为特例加以推广。我们将SymmPI应用于网络中与顶点相关的未观测值预测,其中网络结构在保持不变的重新标记下分布保持不变。在两层层次模型的多个模拟以及实证数据分析案例中,SymmPI相比现有方法表现更优。