With the development of various applications, such as social networks and knowledge graphs, graph data has been ubiquitous in the real world. Unfortunately, graphs usually suffer from being absent due to privacy-protecting policies or copyright restrictions during data collection. The absence of graph data can be roughly categorized into attribute-incomplete and attribute-missing circumstances. Specifically, attribute-incomplete indicates that a part of the attribute vectors of all nodes are incomplete, while attribute-missing indicates that the whole attribute vectors of partial nodes are missing. Although many efforts have been devoted, none of them is custom-designed for a common situation where both types of graph data absence exist simultaneously. To fill this gap, we develop a novel network termed Revisiting Initializing Then Refining (RITR), where we complete both attribute-incomplete and attribute-missing samples under the guidance of a novel initializing-then-refining imputation criterion. Specifically, to complete attribute-incomplete samples, we first initialize the incomplete attributes using Gaussian noise before network learning, and then introduce a structure-attribute consistency constraint to refine incomplete values by approximating a structure-attribute correlation matrix to a high-order structural matrix. To complete attribute-missing samples, we first adopt structure embeddings of attribute-missing samples as the embedding initialization, and then refine these initial values by adaptively aggregating the reliable information of attribute-incomplete samples according to a dynamic affinity structure. To the best of our knowledge, this newly designed method is the first unsupervised framework dedicated to handling hybrid-absent graphs. Extensive experiments on four datasets have verified that our methods consistently outperform existing state-of-the-art competitors.
翻译:随着社交网络和知识图谱等各类应用的发展,图数据在现实世界中已无处不在。然而,由于数据收集过程中隐私保护政策或版权限制,图数据常面临缺失问题。图数据的缺失可大致分为属性不完整和属性缺失两种情况:属性不完整指所有节点中部分属性向量不完整,而属性缺失指部分节点的全部属性向量缺失。尽管已有诸多研究,但尚无针对这两种图数据缺失类型同时存在的常见场景进行定制化设计的方法。为填补这一空白,我们开发了一种名为“重新审视先初始化后精炼”(RITR)的新型网络,在该网络中以新颖的“先初始化后精炼”填补准则为指导,同时完成属性不完整样本和属性缺失样本的填补。具体而言,为填补属性不完整样本,我们首先在网络学习前使用高斯噪声初始化不完整属性,随后引入结构-属性一致性约束,通过将结构-属性相关矩阵逼近高阶结构矩阵来精炼不完整值。为填补属性缺失样本,我们首先采用属性缺失样本的结构嵌入作为嵌入初始化,再根据动态亲和结构自适应聚合属性不完整样本的可靠信息,从而精炼这些初始值。据我们所知,这种新设计的方法是首个专门处理混合缺失图的非监督框架。在四个数据集上的大量实验证明,我们的方法始终优于现有最先进的竞争方法。