Community Search (CS) is one of the fundamental graph analysis tasks, which is a building block of various real applications. Given any query nodes, CS aims to find cohesive subgraphs that query nodes belong to. Recently, a large number of CS algorithms are designed. These algorithms adopt predefined subgraph patterns to model the communities, which cannot find ground-truth communities that do not have such pre-defined patterns in real-world graphs. Thereby, machine learning (ML) and deep learning (DL) based approaches are proposed to capture flexible community structures by learning from ground-truth communities in a data-driven fashion. These approaches rely on sufficient training data to provide enough generalization for ML models, however, the ground-truth cannot be comprehensively collected beforehand. In this paper, we study ML/DL-based approaches for CS, under the circumstance of small training data. Instead of directly fitting the small data, we extract prior knowledge which is shared across multiple CS tasks via learning a meta model. Each CS task is a graph with several queries that possess corresponding partial ground-truth. The meta model can be swiftly adapted to a task to be predicted by feeding a few task-specific training data. We find that trivially applying multiple classical metalearning algorithms to CS suffers from problems regarding prediction effectiveness, generalization capability and efficiency. To address such problems, we propose a novel meta-learning based framework, Conditional Graph Neural Process (CGNP), to fulfill the prior extraction and adaptation procedure. A meta CGNP model is a task-common node embedding function for clustering, learned by metric-based graph learning, which fully exploits the characteristics of CS. We compare CGNP with CS algorithms and ML baselines on real graphs with ground-truth communities.
翻译:社区搜索(Community Search, CS)是图分析的基本任务之一,也是多种实际应用的基础模块。给定任意查询节点,社区搜索旨在找到这些节点所属的紧密子图。近年来,大量社区搜索算法被提出,这些算法采用预定义的子图模式来建模社区,但无法在真实图中发现不具备此类预定义模式的地面真值社区。为此,研究者提出了基于机器学习(ML)和深度学习(DL)的方法,通过数据驱动方式从地面真值社区学习灵活的结构模式。这些方法依赖充足的训练数据以提供足够的泛化能力,然而地面真值社区无法事先全面收集。本文研究小训练数据场景下基于机器学习和深度学习的社区搜索方法。我们不直接拟合小数据,而是通过元模型学习跨多个社区搜索任务共享的先验知识。每个社区搜索任务对应一个包含若干查询节点及其部分地面真值社区的图。该元模型可通过输入少量任务特定训练数据快速适应待预测任务。我们发现直接应用多种经典元学习算法处理社区搜索时,存在预测有效性、泛化能力和效率方面的缺陷。为解决这些问题,我们提出基于元学习的新型框架——条件图神经过程(Conditional Graph Neural Process, CGNP),实现先验提取与适应过程。元条件图神经过程模型是一种基于度量图学习的任务公共节点嵌入聚类函数,充分挖掘社区搜索的特性。我们在含有地面真值社区的真实图上将条件图神经过程与社区搜索算法和机器学习基线方法进行对比。