Random Walk-based Community Key-members Search over Large Graphs

Given a graph $G$, a query node $q$, and an integer $k$, community search (CS) seeks a cohesive subgraph (measured by community models such as $k$-core or $k$-truss) from $G$ that contains $q$. It is difficult for ordinary users with less knowledge of graphs' complexity to set an appropriate $k$. Even if we define quite a large $k$, the community size returned by CS is often too large for users to gain much insight about it. Compared against the entire community, key-members in the community appear more valuable than others. To contend with this, we focus on Community Key-members Search problem (CKS). We turn our perspective to the key-members in the community containing $q$ instead of the entire community. To solve CKS problem, we first propose an exact algorithm based on truss decomposition as a baseline. Then, we present four random walk-based optimized algorithms to achieve a trade-off between effectiveness and efficiency, by carefully considering three important cohesiveness features in the design of transition matrix. As a result, we return key-members according to the stationary distribution when random walk converges. We theoretically analyze the rationality of designing the cohesiveness-aware transition matrix for random walk, through Bayesian theory based on Gaussian Mixture Model with Box-Cox Transformation and Copula Function Fitting. Moreover, we propose a lightweight refinement method following an ``expand-replace" manner to further optimize the result with little overhead, and we extend our method for CKS with multiple query nodes. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority.

翻译：给定图$G$、查询节点$q$和整数$k$，社区搜索(CS)旨在从$G$中寻找包含$q$的紧密子图（以$k$-核或$k$-桁架等社区模型度量）。对于缺乏图结构复杂性的普通用户而言，设置合理的$k$值颇具挑战。即使定义较大的$k$，CS返回的社区规模通常仍过于庞大，用户难以从中获得深入洞察。相较于整个社区，其中的关键成员更具价值。为此，我们聚焦于社区关键成员搜索问题(CKS)，将视角从包含$q$的整个社区转向其关键成员。针对CKS问题，我们首先提出基于桁架分解的精确算法作为基准。随后，通过精心设计转移矩阵中的三个重要内聚性特征，提出四种基于随机游走的优化算法，实现效果与效率的平衡。当随机游走收敛时，依据稳态分布返回关键成员。通过基于Box-Cox变换与Copula函数拟合的高斯混合模型的贝叶斯理论，我们从理论上论证了内聚性感知随机游走转移矩阵设计的合理性。此外，我们提出一种轻量级的"扩展-替换"精炼方法，以极小开销进一步优化结果，并将方法扩展至多查询节点的CKS场景。在多种真实数据集上的综合实验表明，本方法具有显著优越性。