Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.
翻译:图神经网络(GNN)是处理图结构数据的强大机器学习预测模型。然而,GNN缺乏严格的不确定性估计,限制了其在错误成本较高的场景中的可靠部署。我们提出共形化图神经网络(CF-GNN),将共形预测(CP)扩展到基于图的模型,以实现具有保证的不确定性估计。对于图中的任意实体,CF-GNN生成一个预测集/区间,该预测集/区间能以预定义的覆盖概率(例如90%)保证包含真实标签。我们建立了排列不变性条件,使CP在图数据上的有效性得以成立,并精确刻画了测试阶段的覆盖特性。此外,除有效的覆盖外,在实际应用中缩小预测集大小/区间长度至关重要。我们观察到非一致性分数与网络结构之间存在关键联系,这促使我们开发了一种拓扑感知的输出校正模型,该模型学习更新预测结果并生成更高效的预测集/区间。大量实验表明,CF-GNN在实现任意预定义目标边际覆盖的同时,将预测集/区间大小相较于基准方法显著缩减高达74%。该方法还在各类原始特征与网络特征上取得了令人满意的条件覆盖效果。