Community Search: A Meta-Learning Approach

Community Search (CS) is one of the fundamental graph analysis tasks, which is a building block of various real applications. Given any query nodes, CS aims to find cohesive subgraphs that query nodes belong to. Recently, a large number of CS algorithms are designed. These algorithms adopt predefined subgraph patterns to model the communities, which cannot find ground-truth communities that do not have such pre-defined patterns in real-world graphs. Thereby, machine learning (ML) and deep learning (DL) based approaches are proposed to capture flexible community structures by learning from ground-truth communities in a data-driven fashion. These approaches rely on sufficient training data to provide enough generalization for ML models, however, the ground-truth cannot be comprehensively collected beforehand. In this paper, we study ML/DL-based approaches for CS, under the circumstance of small training data. Instead of directly fitting the small data, we extract prior knowledge which is shared across multiple CS tasks via learning a meta model. Each CS task is a graph with several queries that possess corresponding partial ground-truth. The meta model can be swiftly adapted to a task to be predicted by feeding a few task-specific training data. We find that trivially applying multiple classical metalearning algorithms to CS suffers from problems regarding prediction effectiveness, generalization capability and efficiency. To address such problems, we propose a novel meta-learning based framework, Conditional Graph Neural Process (CGNP), to fulfill the prior extraction and adaptation procedure. A meta CGNP model is a task-common node embedding function for clustering, learned by metric-based graph learning, which fully exploits the characteristics of CS. We compare CGNP with CS algorithms and ML baselines on real graphs with ground-truth communities.

翻译：社区搜索（Community Search, CS）是图分析的基本任务之一，也是多种实际应用的基础模块。给定任意查询节点，社区搜索旨在找到这些节点所属的紧密子图。近年来，大量社区搜索算法被提出，这些算法采用预定义的子图模式来建模社区，但无法在真实图中发现不具备此类预定义模式的地面真值社区。为此，研究者提出了基于机器学习（ML）和深度学习（DL）的方法，通过数据驱动方式从地面真值社区学习灵活的结构模式。这些方法依赖充足的训练数据以提供足够的泛化能力，然而地面真值社区无法事先全面收集。本文研究小训练数据场景下基于机器学习和深度学习的社区搜索方法。我们不直接拟合小数据，而是通过元模型学习跨多个社区搜索任务共享的先验知识。每个社区搜索任务对应一个包含若干查询节点及其部分地面真值社区的图。该元模型可通过输入少量任务特定训练数据快速适应待预测任务。我们发现直接应用多种经典元学习算法处理社区搜索时，存在预测有效性、泛化能力和效率方面的缺陷。为解决这些问题，我们提出基于元学习的新型框架——条件图神经过程（Conditional Graph Neural Process, CGNP），实现先验提取与适应过程。元条件图神经过程模型是一种基于度量图学习的任务公共节点嵌入聚类函数，充分挖掘社区搜索的特性。我们在含有地面真值社区的真实图上将条件图神经过程与社区搜索算法和机器学习基线方法进行对比。

相关内容

计算机科学

关注 56

计算机科学（Computer Science, CS）是系统性研究信息与计算的理论基础以及它们在计算机系统中如何实现与应用的实用技术的学科。它通常被形容为对那些创造、描述以及转换信息的算法处理的系统研究。计算机科学包含很多分支领域；其中一些，比如计算机图形学强调特定结果的计算，而另外一些，比如计算复杂性理论是学习计算问题的性质。还有一些领域专注于挑战怎样实现计算。比如程序设计语言理论学习描述计算的方法，而程序设计是应用特定的程序设计语言解决特定的计算问题，人机交互则是专注于挑战怎样使计算机和计算变得有用、可用，以及随时随地为人所用。 现代计算机科学( Computer Science)包含理论计算机科学和应用计算机科学两大分支。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日