Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.
翻译:图神经网络(GNN)是针对图结构数据的强大机器学习预测模型。然而,GNN缺乏严格的不确定性估计,限制了其在错误代价高昂场景中的可靠部署。我们提出保形化GNN(CF-GNN),将保形预测(CP)扩展至基于图的模型以实现具有保证的不确定性估计。给定图中的实体,CF-GNN生成的预测集/区间可证明以预设的覆盖概率(例如90%)包含真实标签。我们构建了能使CP在图数据上有效的置换不变性条件,并精确刻画了测试时覆盖特性。此外,除了保证有效覆盖外,降低预测集大小/区间长度对实际应用至关重要。我们观察到非一致性得分与网络结构之间的关键联系,这启发我们开发了拓扑感知的输出校正模型,通过学习更新预测以生成更高效的预测集/区间。大量实验表明,CF-GNN在实现任意预设目标边际覆盖的同时,相比基线方法将预测集/区间大小显著降低最高达74%。同时,该模型在不同原始特征和网络特征上实现了令人满意的条件覆盖。