We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.
翻译:我们研究了在实践常见但通常导致节点非代表性样本的各种采样机制下,网络数据保形预测的性质。我们将这些采样机制解释为应用于超总体的选择规则,并研究在适当选择事件条件下保形预测的有效性。我们证明:若选择规则满足置换不变性且超总体满足联合可交换性条件,则采样子阵列在选择事件条件下具有可交换性。该结果表明,对于与自我中心网络和滚雪球采样相关的特定选择事件,保形预测具有有限样本有效性。我们还指出,当数据通过图上的随机游走采样时,加权保形预测的变体可为从总体中独立选择的节点生成渐近有效的预测集。