Open-set graph learning is a practical task that aims to classify the known class nodes and to identify unknown class samples as unknowns. Conventional node classification methods usually perform unsatisfactorily in open-set scenarios due to the complex data they encounter, such as out-of-distribution (OOD) data and in-distribution (IND) noise. OOD data are samples that do not belong to any known classes. They are outliers if they occur in training (OOD noise), and open-set samples if they occur in testing. IND noise are training samples which are assigned incorrect labels. The existence of IND noise and OOD noise is prevalent, which usually cause the ambiguity problem, including the intra-class variety problem and the inter-class confusion problem. Thus, to explore robust open-set learning methods is necessary and difficult, and it becomes even more difficult for non-IID graph data.To this end, we propose a unified framework named ROG$_{PL}$ to achieve robust open-set learning on complex noisy graph data, by introducing prototype learning. In specific, ROG$_{PL}$ consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions. The first module corrects noisy labels through similarity-based label propagation and removes low-confidence samples, to solve the intra-class variety problem caused by noise. The second module learns open-set prototypes for each known class via non-overlapped regions and remains both interior and border prototypes to remedy the inter-class confusion problem.The two modules are iteratively updated under the constraints of classification loss and prototype diversity loss. To the best of our knowledge, the proposed ROG$_{PL}$ is the first robust open-set node classification method for graph data with complex noise.
翻译:开集图学习是一项实用任务,旨在对已知类别节点进行分类,并将未知类别样本识别为未知。传统节点分类方法通常由于遇到复杂数据(如分布外(OOD)数据和分布内(IND)噪声)而在开集场景下表现不佳。OOD数据是指不属于任何已知类别的样本,若出现在训练集中则为离群点(OOD噪声),若出现在测试集中则为开集样本。IND噪声是指被分配了错误标签的训练样本。IND噪声与OOD噪声的存在普遍会导致歧义性问题,包括类内多样性问题与类间混淆问题。因此,探索鲁棒开集学习方法必要且困难,而对于非独立同分布的图数据则更为挑战。为此,我们提出一个统一框架ROG$_{PL}$,通过引入原型学习,实现对含复杂噪声图数据的鲁棒开集学习。具体而言,ROG$_{PL}$包含两个模块:基于标签传播的去噪模块与基于区域的开集原型学习模块。第一模块通过基于相似度的标签传播纠正噪声标签,并移除低置信度样本,以解决由噪声引起的类内多样性问题。第二模块通过非重叠区域为每个已知类别学习开集原型,并保留内部与边界原型以缓解类间混淆问题。两个模块在分类损失与原型多样性损失的约束下迭代更新。据我们所知,所提出的ROG$_{PL}$是首个针对含复杂噪声图数据的鲁棒开集节点分类方法。