Subgraph counting is a fundamental problem in understanding and analyzing graph structured data, yet computationally challenging. This calls for an accurate and efficient algorithm for Subgraph Cardinality Estimation, which is to estimate the number of all isomorphic embeddings of a query graph in a data graph. We present FaSTest, a novel algorithm that combines (1) a powerful filtering technique to significantly reduce the sample space, (2) an adaptive tree sampling algorithm for accurate and efficient estimation, and (3) a worst-case optimal stratified graph sampling algorithm for difficult instances. Extensive experiments on real-world datasets show that FaSTest outperforms state-of-the-art sampling-based methods by up to two orders of magnitude and GNN-based methods by up to three orders of magnitude in terms of accuracy.
翻译:子图计数是理解和分析图结构数据中的基本问题,但在计算上具有挑战性。这需要一种准确高效的子图基数估计算法,用于估计查询图在数据图中所有同构嵌入的数量。我们提出FaSTest,一种新颖的算法,结合了(1)一种强大的过滤技术以显著减少样本空间,(2)一种自适应树采样算法以实现准确高效的估计,以及(3)一种针对困难实例的最坏情况最优分层图采样算法。在真实数据集上的大量实验表明,FaSTest在准确率方面优于最先进的基于采样的方法多达两个数量级,优于基于GNN的方法多达三个数量级。