Given a graph $G$, a query node $q$, and an integer $k$, community search (CS) seeks a cohesive subgraph (measured by community models such as $k$-core or $k$-truss) from $G$ that contains $q$. It is difficult for ordinary users with less knowledge of graphs' complexity to set an appropriate $k$. Even if we define quite a large $k$, the community size returned by CS is often too large for users to gain much insight about it. Compared against the entire community, key-members in the community appear more valuable than others. To contend with this, we focus on Community Key-members Search problem (CKS). We turn our perspective to the key-members in the community containing $q$ instead of the entire community. To solve CKS problem, we first propose an exact algorithm based on truss decomposition as a baseline. Then, we present four random walk-based optimized algorithms to achieve a trade-off between effectiveness and efficiency, by carefully considering three important cohesiveness features in the design of transition matrix. As a result, we return key-members according to the stationary distribution when random walk converges. We theoretically analyze the rationality of designing the cohesiveness-aware transition matrix for random walk, through Bayesian theory based on Gaussian Mixture Model with Box-Cox Transformation and Copula Function Fitting. Moreover, we propose a lightweight refinement method following an ``expand-replace" manner to further optimize the result with little overhead, and we extend our method for CKS with multiple query nodes. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority.
翻译:给定图$G$、查询节点$q$和整数$k$,社区搜索问题旨在从$G$中寻找包含$q$的稠密子图(通过$k$-core或$k$-truss等社区模型度量)。缺乏图结构知识的普通用户难以设定合适的$k$值。即使定义较大的$k$,社区搜索返回的社区规模往往过大,用户难以洞察其核心信息。相比于整个社区,社区中的关键成员更具价值。为此,我们聚焦于社区关键成员搜索问题,将视角从包含$q$的完整社区转向其关键成员。针对该问题,首先提出基于truss分解的精确算法作为基线。随后,通过精心设计转移矩阵中三种重要凝聚性特征,提出四种基于随机游走的优化算法以平衡有效性与效率。最终根据随机游走收敛时的稳态分布返回关键成员。我们基于Box-Cox变换与Copula函数拟合的高斯混合模型的贝叶斯理论,理论分析了设计凝聚性感知随机游走转移矩阵的合理性。进一步提出轻量级的"扩展-替换"优化方法,以极小开销优化结果,并将方法扩展至多查询节点场景。在多种真实数据集上的综合实验表明本方法具有优越性。