重新审视图神经网络与缺失特征：挑战、评估与一种鲁棒解决方案 (Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution)

Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address relatively benign scenarios, namely benchmark datasets with (a) high-dimensional but sparse node features and (b) incomplete data generated under Missing Completely At Random (MCAR) mechanisms. For (a), we theoretically prove that high sparsity substantially limits the information loss caused by missingness, making all models appear robust and preventing a meaningful comparison of their performance. To overcome this limitation, we introduce one synthetic and three real-world datasets with dense, semantically meaningful features. For (b), we move beyond MCAR and design evaluation protocols with more realistic missingness mechanisms. Moreover, we provide a theoretical background to state explicit assumptions on the missingness process and analyze their implications for different methods. Building on this analysis, we propose GNNmim, a simple yet effective baseline for node classification with incomplete feature data. Experiments show that GNNmim is competitive with respect to specialized architectures across diverse datasets and missingness regimes.

翻译：处理缺失节点特征是图神经网络（GNNs）在医疗保健和传感器网络等现实领域部署时面临的关键挑战。现有研究大多针对相对良性的场景，即具有以下特点的基准数据集：（a）高维但稀疏的节点特征，以及（b）在完全随机缺失（MCAR）机制下生成的不完整数据。对于（a），我们从理论上证明，高稀疏性极大地限制了由缺失引起的信息损失，使得所有模型都表现出鲁棒性，从而阻碍了对它们性能的有意义比较。为克服这一局限，我们引入了一个合成数据集和三个具有密集且语义丰富特征的真实世界数据集。对于（b），我们超越了MCAR，设计了采用更现实缺失机制的评估方案。此外，我们提供了理论背景，以明确陈述关于缺失过程的假设，并分析这些假设对不同方法的影响。基于此分析，我们提出了GNNmim——一种用于不完整特征数据节点分类的简单而有效的基线方法。实验表明，GNNmim在多样化数据集和缺失机制下，与专用架构相比具有竞争力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

图机器学习的核心原理：表征、鲁棒性与泛化性

专知会员服务

22+阅读 · 2月4日

《图神经网络不确定性》最新综述

专知会员服务

28+阅读 · 2024年3月13日

【ICML2023】图神经网络可以仅从图结构中恢复隐藏特征

专知会员服务

32+阅读 · 2023年4月27日

如何重构图神经网络？98页LoG2022《图重连:从理论到应用》教程，附代码

专知会员服务

44+阅读 · 2022年12月13日