Given a graph $G$, a query node $q$, and an integer $k$, community search (CS) seeks a cohesive subgraph (measured by community models such as $k$-core or $k$-truss) from $G$ that contains $q$. It is difficult for ordinary users with less knowledge of graphs' complexity to set an appropriate $k$. Even if we define quite a large $k$, the community size returned by CS is often too large for users to gain much insight about it. Compared against the entire community, key-members in the community appear more valuable than others. To contend with this, we focus on Community Key-members Search problem (CKS). We turn our perspective to the key-members in the community containing $q$ instead of the entire community. To solve CKS problem, we first propose an exact algorithm based on truss decomposition as a baseline. Then, we present four random walk-based optimized algorithms to achieve a trade-off between effectiveness and efficiency, by carefully considering three important cohesiveness features in the design of transition matrix. As a result, we return key-members according to the stationary distribution when random walk converges. We theoretically analyze the rationality of designing the cohesiveness-aware transition matrix for random walk, through Bayesian theory based on Gaussian Mixture Model with Box-Cox Transformation and Copula Function Fitting. Moreover, we propose a lightweight refinement method following an ``expand-replace" manner to further optimize the result with little overhead, and we extend our method for CKS with multiple query nodes. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority.
翻译:给定图$G$、查询节点$q$和整数$k$,社区搜索(CS)旨在从$G$中寻找包含$q$的紧密子图(以$k$-核或$k$-桁架等社区模型度量)。对于缺乏图结构复杂性的普通用户而言,设置合理的$k$值颇具挑战。即使定义较大的$k$,CS返回的社区规模通常仍过于庞大,用户难以从中获得深入洞察。相较于整个社区,其中的关键成员更具价值。为此,我们聚焦于社区关键成员搜索问题(CKS),将视角从包含$q$的整个社区转向其关键成员。针对CKS问题,我们首先提出基于桁架分解的精确算法作为基准。随后,通过精心设计转移矩阵中的三个重要内聚性特征,提出四种基于随机游走的优化算法,实现效果与效率的平衡。当随机游走收敛时,依据稳态分布返回关键成员。通过基于Box-Cox变换与Copula函数拟合的高斯混合模型的贝叶斯理论,我们从理论上论证了内聚性感知随机游走转移矩阵设计的合理性。此外,我们提出一种轻量级的"扩展-替换"精炼方法,以极小开销进一步优化结果,并将方法扩展至多查询节点的CKS场景。在多种真实数据集上的综合实验表明,本方法具有显著优越性。