Suppose we wish to estimate $\#H$, the number of copies of some small graph $H$ in a large streaming graph $G$. There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O(m^k/(\#H)^2)$, where $m$ is the number of edges in $G$, and $k$ is the number of edges in $H$. Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that $H$ has no leaves and that $G$ has maximum degree $\leq m^{1/2 - \alpha}$, where $\alpha > 0$. Define $C = \min(m^{2\alpha},m^{1/3})$. Then in our version of the algorithm, the update time per edge is $O(1)$, and the storage is approximately reduced by a factor of $C^{2k-t-2}$, where $t$ is the number of vertices in $H$; in particular, the storage is $O(C^2 + m^k/(C^{2k-t-2} (\#H)^2))$.
翻译:假设我们希望在大型流式图 $G$ 中估计某个小图 $H$ 的副本数量 $\#H$。当 $H$ 为三角形时,已有多种算法可完成此任务,但适用于任意 $H$ 的算法却寥寥无几。本文聚焦于 Kane、Mehlhorn、Sauerwald 和 Sun 提出的一种此类算法。该算法每条边的存储和更新时间均为 $O(m^k/(\#H)^2)$,其中 $m$ 为 $G$ 的边数,$k$ 为 $H$ 的边数。在此,我们提出对该算法的三项改进,可显著降低存储和更新时间。假设 $H$ 无叶节点,且 $G$ 的最大度数 $\leq m^{1/2 - \alpha}$(其中 $\alpha > 0$)。定义 $C = \min(m^{2\alpha}, m^{1/3})$。则在我们的改进算法中,每条边的更新时间为 $O(1)$,存储量近似降低 $C^{2k-t-2}$ 倍(其中 $t$ 为 $H$ 的顶点数);具体而言,存储量为 $O(C^2 + m^k/(C^{2k-t-2} (\#H)^2))$。