Evaluating Stability in Massive Social Networks: Efficient Streaming Algorithms for Structural Balance

Structural balance theory studies stability in networks. Given a $n$-vertex complete graph $G=(V,E)$ whose edges are labeled positive or negative, the graph is considered \emph{balanced} if every triangle either consists of three positive edges (three mutual ``friends''), or one positive edge and two negative edges (two ``friends'' with a common ``enemy''). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: $(i)$ detecting whether a given graph is balanced, or $(ii)$ finding a partition that approximates the \emph{frustration index}, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on \emph{memory efficiency}. We provide randomized single-pass algorithms for: $(i)$ determining whether an input graph is balanced with $O(\log{n})$ memory, and $(ii)$ finding a partition that induces a $(1 + \varepsilon)$-approximation to the frustration index with $O(n \cdot \text{polylog}(n))$ memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independently-chosen \emph{vertices} in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the state-of-the-art correlation clustering with two clusters (Giotis-Guruswami algorithm [SODA 2006]) from $n^{O(1/\varepsilon^2)}$ to $O(n^2\log^3{n}/\varepsilon^2 + n\log n \cdot (1/\varepsilon)^{O(1/\varepsilon^4)})$ time for $(1+\varepsilon)$-approximation. These results may be of independent interest.

翻译：结构平衡理论研究网络中的稳定性。给定一个$n$个顶点的完全图$G=(V,E)$，其边被标记为正或负，如果每个三角形由三条正边（三个相互"朋友"）或一条正边与两条负边（两个"朋友"有一个共同"敌人"）组成，则该图被称为\emph{平衡的}。从计算角度来看，结构平衡实际上是聚类数至多为2的相关聚类的一个特例。两个主要算法问题为：$(i)$检测给定图是否平衡，或$(ii)$寻找一个划分以近似\emph{挫折指数}，即最小边翻转次数使得图变为平衡。我们在流式模型下研究这些问题，其中边逐个输入，并关注\emph{内存效率}。我们提供了随机化单遍算法用于：$(i)$使用$O(\log{n})$内存确定输入图是否平衡，以及$(ii)$使用$O(n \cdot \text{polylog}(n))$内存寻找一个划分，实现$(1 + \varepsilon)$近似挫折指数。我们进一步提供了若干新下界，补充了我们算法在不同方面的需求，如随机化或近似性。为实现主要结果，我们开发了一种使用伪随机生成器（PRGs）在流式图中对独立选择的\emph{顶点}之间的边进行采样的方法。此外，近似挫折指数的算法将两个聚类的相关聚类最新算法（Giotis-Guruswami算法 [SODA 2006]）的运行时间从$n^{O(1/\varepsilon^2)}$改进为$O(n^2\log^3{n}/\varepsilon^2 + n\log n \cdot (1/\varepsilon)^{O(1/\varepsilon^4)})$以实现$(1+\varepsilon)$近似。这些结果可能具有独立研究价值。