Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater attention in the literature than the sampling variant. In this work, we consider the problem of approximately sampling triangles in different models of streaming with the focus being on the adjacency list model. In this problem, the edges of a graph $G$ will arrive over a data stream. The goal is to design efficient streaming algorithms that can sample and output a triangle from a distribution, over the triangles in $G$, that is close to the uniform distribution over the triangles in $G$. The distance between distributions is measured in terms of $\ell_1$-distance. The main technical contribution of this paper is to design algorithms for this triangle sampling problem in the adjacency list model with the space complexities matching their counting variants. For the sake of completeness, we also show results on the vertex and edge arrival models.
翻译:三角形计数与抽样是流式算法的两个基础问题。可以说,设计抽样算法比其计数变体更具挑战性。值得注意的是,文献中对三角形计数的关注远多于抽样变体。本研究针对不同流式模型中的三角形近似抽样问题展开,重点聚焦邻接表模型。在该问题中,图 $G$ 的边将通过数据流依次到达。目标是设计高效的流式算法,能够从 $G$ 中所有三角形构成的空间中,以接近均匀分布的方式抽样并输出一个三角形。分布之间的距离采用 $\ell_1$ 距离度量。本文的主要技术贡献在于,针对邻接表模型中的三角形抽样问题,设计了空间复杂度与其计数变体相匹配的算法。为完备性起见,我们同时给出了顶点到达模型与边到达模型上的相关结果。