Querying cohesive subgraphs on temporal graphs (e.g., social network, finance network, etc.) with various conditions has attracted intensive research interests recently. In this paper, we study a novel Temporal $(k,\mathcal{X})$-Core Query (TXCQ) that extends a fundamental Temporal $k$-Core Query (TCQ) proposed in our conference paper by optimizing or constraining an arbitrary metric $\mathcal{X}$ of $k$-core, such as size, engagement, interaction frequency, time span, burstiness, periodicity, etc. Our objective is to address specific TXCQ instances with conditions on different $\mathcal{X}$ in a unified algorithm framework that guarantees scalability. For that, this journal paper proposes a taxonomy of measurement $\mathcal{X}(\cdot)$ and achieve our objective using a two-phase framework while $\mathcal{X}(\cdot)$ is time-insensitive or time-monotonic. Specifically, Phase 1 still leverages the query processing algorithm of TCQ to induce all distinct $k$-cores during a given time range, and meanwhile locates the "time zones" in which the cores emerge. Then, Phase 2 conducts fast local search and $\mathcal{X}$ evaluation in each time zone with respect to the time insensitivity or monotonicity of $\mathcal{X}(\cdot)$. By revealing two insightful concepts named tightest time interval and loosest time interval that bound time zones, the redundant core induction and unnecessary $\mathcal{X}$ evaluation in a zone can be reduced dramatically. Our experimental results demonstrate that TXCQ can be addressed as efficiently as TCQ, which achieves the latest state-of-the-art performance, by using a general algorithm framework that leaves $\mathcal{X}(\cdot)$ as a user-defined function.
翻译:针对时序图(如社交网络、金融网络等)上具有多种约束条件的凝聚子图查询问题,近期引起了广泛的研究兴趣。本文研究一种新颖的时序$(k,\mathcal{X})$-核心查询(TXCQ),该查询扩展了我们在会议论文中提出的基础时序$k$-核心查询(TCQ),通过优化或约束$k$-核心的任意度量$\mathcal{X}$(如大小、参与度、交互频率、时间跨度、突发性、周期性等)来实现。我们的目标是在一个保证可扩展性的统一算法框架中,处理具有不同$\mathcal{X}$约束的特定TXCQ实例。为此,本期刊论文提出度量$\mathcal{X}(\cdot)$的分类体系,并在$\mathcal{X}(\cdot)$为时间不敏感或时间单调时,采用两阶段框架实现上述目标。具体而言,阶段1仍利用TCQ的查询处理算法,在给定时间范围内诱导所有不同的$k$-核心,同时定位核心涌现的“时间区域”。随后,阶段2在每个时间区域内,根据$\mathcal{X}(\cdot)$的时间不敏感性或单调性,执行快速局部搜索与$\mathcal{X}$评估。通过揭示两个名为最紧时间区间和最松时间区间的概念来界定时间区域,可大幅减少区域内的冗余核心诱导和不必要的$\mathcal{X}$评估。实验结果表明,采用将$\mathcal{X}(\cdot)作为用户定义函数的通用算法框架,TXCQ可实现与TCQ(已达到最新最优性能)同等的查询效率。