Subgraph counting is a fundamental problem in understanding and analyzing graph structured data, yet computationally challenging. This calls for an accurate and efficient algorithm for Subgraph Cardinality Estimation, which is to estimate the number of all isomorphic embeddings of a query graph in a data graph. We present FaSTest, a novel algorithm that combines (1) a powerful filtering technique to significantly reduce the sample space, (2) an adaptive tree sampling algorithm for accurate and efficient estimation, and (3) a worst-case optimal stratified graph sampling algorithm for difficult instances. Extensive experiments on real-world datasets show that FaSTest outperforms state-of-the-art sampling-based methods by up to two orders of magnitude and GNN-based methods by up to three orders of magnitude in terms of accuracy.
翻译:子图计数是理解和分析图结构数据的基本问题,但在计算上具有挑战性。这就要求一种准确高效的子图基数估计算法,该算法用于估计查询图在数据图中所有同构嵌入的数量。我们提出了FaSTest,这是一种新颖的算法,它结合了(1)一种强大的过滤技术,用于显著缩小采样空间;(2)一种自适应树采样算法,用于准确高效的估计;以及(3)一种针对困难实例的最坏情况最优分层图采样算法。在真实世界数据集上的大量实验表明,就精度而言,FaSTest的性能比最先进的基于采样的方法高出两个数量级,比基于GNN的方法高出三个数量级。