Imputing missing node features in graphs is challenging, particularly under high missing rates. Existing methods based on latent representations or global diffusion often fail to produce reliable estimates, and may propagate errors across the graph. We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. In the first stage, a graph-distance-guided subgraph expansion localizes the diffusion process. A fractional diffusion operator adjusts propagation sharpness based on local structure. In the second stage, imputed features are refined using class-aware propagation, which incorporates pseudo-labels and neighborhood entropy to promote consistency. We evaluated FSD-CAP on multiple datasets. With $99.5\%$ of features missing across five benchmark datasets, FSD-CAP achieves average accuracies of $80.06\%$ (structural) and $81.01\%$ (uniform) in node classification, close to the $81.31\%$ achieved by a standard GCN with full features. For link prediction under the same setting, it reaches AUC scores of $91.65\%$ (structural) and $92.41\%$ (uniform), compared to $95.06\%$ for the fully observed case. Furthermore, FSD-CAP demonstrates superior performance on both large-scale and heterophily datasets when compared to other models.
翻译:在图中补全缺失的节点特征是一项具有挑战性的任务,尤其是在高缺失率的情况下。现有基于潜在表示或全局扩散的方法通常无法产生可靠的估计,并可能在图中传播误差。我们提出了FSD-CAP,一个旨在提升极端稀疏条件下补全质量的两阶段框架。在第一阶段,通过图距离引导的子图扩展来局部化扩散过程。一个分数扩散算子根据局部结构调整传播的锐度。在第二阶段,利用类别感知传播对补全后的特征进行细化,该方法结合了伪标签和邻域熵以促进一致性。我们在多个数据集上评估了FSD-CAP。在五个基准数据集上,当$99.5\%$的特征缺失时,FSD-CAP在节点分类任务中达到了$80.06\%$(结构性缺失)和$81.01\%$(均匀缺失)的平均准确率,接近使用完整特征的标准GCN所达到的$81.31\%$。在相同设置下的链接预测任务中,其AUC分数达到$91.65\%$(结构性缺失)和$92.41\%$(均匀缺失),而完全观测情况下的分数为$95.06\%$。此外,与其他模型相比,FSD-CAP在大规模数据集和异配性数据集上均表现出更优的性能。