Community Search (CS) is one of the fundamental graph analysis tasks, which is a building block of various real applications. Given any query nodes, CS aims to find cohesive subgraphs that query nodes belong to. Recently, a large number of CS algorithms are designed. These algorithms adopt predefined subgraph patterns to model the communities, which cannot find ground-truth communities that do not have such pre-defined patterns in real-world graphs. Thereby, machine learning (ML) and deep learning (DL) based approaches are proposed to capture flexible community structures by learning from ground-truth communities in a data-driven fashion. These approaches rely on sufficient training data to provide enough generalization for ML models, however, the ground-truth cannot be comprehensively collected beforehand. In this paper, we study ML/DL-based approaches for CS, under the circumstance of small training data. Instead of directly fitting the small data, we extract prior knowledge which is shared across multiple CS tasks via learning a meta model. Each CS task is a graph with several queries that possess corresponding partial ground-truth. The meta model can be swiftly adapted to a task to be predicted by feeding a few task-specific training data. We find that trivially applying multiple classical metalearning algorithms to CS suffers from problems regarding prediction effectiveness, generalization capability and efficiency. To address such problems, we propose a novel meta-learning based framework, Conditional Graph Neural Process (CGNP), to fulfill the prior extraction and adaptation procedure. A meta CGNP model is a task-common node embedding function for clustering, learned by metric-based graph learning, which fully exploits the characteristics of CS. We compare CGNP with CS algorithms and ML baselines on real graphs with ground-truth communities.
翻译:社区搜索(CS)是基础性图分析任务之一,也是多种实际应用的构建模块。给定任意查询节点,CS旨在寻找查询节点所属的凝聚性子图。近年来,大量CS算法被设计出来。这些算法采用预定义的子图模式对社区进行建模,因而无法在真实世界图中找到不具备此类预定义模式的真实社区。为此,研究者提出基于机器学习和深度学习的方法,通过从真实社区中学习以数据驱动的方式捕获灵活社区结构。此类方法依赖充足的训练数据为机器学习模型提供充分泛化能力,然而真实社区信息无法预先全面采集。本文研究小规模训练数据场景下基于机器学习/深度学习的CS方法。我们并非直接拟合少量数据,而是通过学习元模型提取多个CS任务间共享的先验知识。每个CS任务对应一个包含若干查询及相应部分真实社区的图。该元模型可通过输入少量任务特定训练数据,快速适配至待预测任务。我们发现将多种经典元学习算法简单应用于CS时,会面临预测有效性、泛化能力和效率方面的问题。为解决这些问题,我们提出新型基于元学习的框架——条件图神经过程(CGNP),用于实现先验提取与适配过程。元CGNP模型是通过基于度量的图学习得到的任务通用节点聚类嵌入函数,该函数充分利用了CS的特性。我们将CGNP与CS算法及机器学习基线方法在包含真实社区的真实图上进行对比。