Open-set graph learning is a practical task that aims to classify the known class nodes and to identify unknown class samples as unknowns. Conventional node classification methods usually perform unsatisfactorily in open-set scenarios due to the complex data they encounter, such as out-of-distribution (OOD) data and in-distribution (IND) noise. OOD data are samples that do not belong to any known classes. They are outliers if they occur in training (OOD noise), and open-set samples if they occur in testing. IND noise are training samples which are assigned incorrect labels. The existence of IND noise and OOD noise is prevalent, which usually cause the ambiguity problem, including the intra-class variety problem and the inter-class confusion problem. Thus, to explore robust open-set learning methods is necessary and difficult, and it becomes even more difficult for non-IID graph data.To this end, we propose a unified framework named ROG$_{PL}$ to achieve robust open-set learning on complex noisy graph data, by introducing prototype learning. In specific, ROG$_{PL}$ consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions. The first module corrects noisy labels through similarity-based label propagation and removes low-confidence samples, to solve the intra-class variety problem caused by noise. The second module learns open-set prototypes for each known class via non-overlapped regions and remains both interior and border prototypes to remedy the inter-class confusion problem.The two modules are iteratively updated under the constraints of classification loss and prototype diversity loss. To the best of our knowledge, the proposed ROG$_{PL}$ is the first robust open-set node classification method for graph data with complex noise.
翻译:开放集图学习是一项实际任务,旨在对已知类别节点进行分类,并将未知类别样本识别为未知样本。传统节点分类方法在开放集场景下通常表现不佳,因其面临复杂数据,如分布外(OOD)数据和分布内(IND)噪声。OOD数据是不属于任何已知类别的样本:若出现在训练集中,则为离群点(OOD噪声);若出现在测试集中,则为开放集样本。IND噪声是训练样本中带有错误标签的数据。IND噪声和OOD噪声的普遍存在通常引发歧义问题,包括类内多样性和类间混淆问题。因此,探索鲁棒的开放集学习方法既必要又困难,而对于非独立同分布的图数据则更具挑战性。为此,我们提出一个统一框架ROG$_{PL}$,通过引入原型学习,在复杂含噪图数据上实现鲁棒开放集学习。具体而言,ROG$_{PL}$包含两个模块:基于标签传播的去噪模块和基于区域的开放集原型学习模块。第一模块通过基于相似度的标签传播校正噪声标签,并移除低置信度样本,以解决由噪声引起的类内多样性问题。第二模块通过非重叠区域为每个已知类别学习开放集原型,并保留内部和边界原型,以缓解类间混淆问题。两个模块在分类损失和原型多样性损失的约束下迭代更新。据我们所知,所提出的ROG$_{PL}$是首个针对具有复杂噪声的图数据的鲁棒开放集节点分类方法。