Engineering Semi-streaming DFS algorithms

Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing $O(nk)$ local space for some $k=o(n)$. Trivially, using $O(m)$ space, DFS can be computed in one pass, and using $O(n)$ space, it can be computed in $O(n)$ passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where $O(nk)$ space results in $O(n/k)$ passes. They also empirically analyzed their algorithm to require only a few passes in practice for even $O(n)$ space. Chang et al. [STACS20] presented an alternate proof for the same and also presented $O(\sqrt{n})$ pass algorithm requiring $O(n~poly\log n)$ space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by $40-90\%$, achieving optimal one pass in almost $40-50\%$ cases (improved from zero). In random graphs, they improve from $30-90\%$, again requiring optimal one pass for even very small values of $k$. Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.

翻译：深度优先搜索是一个基础图问题，具有广泛的应用。对于具有$n$个顶点和$m$条边的图$G=(V,E)$，DFS树可在$O(m+n)$时间内使用$O(m)$空间计算，其中$m=O(n^2)$。在流式环境中，大多数图问题在半流式模型下研究，该模型允许对输入进行多次遍历（最好是一次），并允许$O(nk)$的局部空间，其中$k=o(n)$。平凡地，使用$O(m)$空间可在一次遍历中计算DFS，而使用$O(n)$空间则需$O(n)$次遍历。Khan和Mehta [STACS19]提出了若干算法，实现了空间与遍历次数之间的权衡，其中$O(nk)$空间对应$O(n/k)$次遍历。他们通过实验分析表明，即使使用$O(n)$空间，其算法在实践中也仅需少量遍历。Chang等人 [STACS20]给出了相同结果的另一种证明，并提出了一种$O(\sqrt{n})$次遍历的算法，该算法需要$O(n~poly\log n)$空间，并在空间与遍历次数之间实现了更精细的权衡。然而，该算法使用了复杂的黑盒算法，使其不切实际。我们对实用的半流式DFS算法进行了实验分析，涵盖真实图与随机图（均匀分布和幂律分布）。我们还提出了若干启发式方法以改进现有最优算法，并研究了它们的影响。我们的启发式方法将现有最优性能提升了$40-90\%$，在几乎$40-50\%$的案例中实现了最优单次遍历（从零提升）。在随机图中，性能提升为$30-90\%$，即使对于非常小的$k$值也能达到最优单次遍历。总体而言，我们的启发式方法显著改进了相对复杂的现有最优算法，在随机图的最坏情况下仅需两次遍历。此外，我们的启发式方法使相对简单的算法在极小空间限制下也变得实用，而这在之前是不可行的。