We design the first node-differentially private algorithm for approximating the number of connected components in a graph. Given a database representing an $n$-vertex graph $G$ and a privacy parameter $\varepsilon$, our algorithm runs in polynomial time and, with probability $1-o(1)$, has additive error $\widetilde{O}(\frac{\Delta^*\ln\ln n}{\varepsilon}),$ where $\Delta^*$ is the smallest possible maximum degree of a spanning forest of $G.$ Node-differentially private algorithms are known only for a small number of database analysis tasks. A major obstacle for designing such an algorithm for the number of connected components is that this graph statistic is not robust to adding one node with arbitrary connections (a change that node-differential privacy is designed to hide): every graph is a neighbor of a connected graph. We overcome this by designing a family of efficiently computable Lipschitz extensions of the number of connected components or, equivalently, the size of a spanning forest. The construction of the extensions, which is at the core of our algorithm, is based on the forest polytope of $G.$ We prove several combinatorial facts about spanning forests, in particular, that a graph with no induced $\Delta$-stars has a spanning forest of degree at most $\Delta$. With this fact, we show that our Lipschitz extensions for the number of connected components equal the true value of the function for the largest possible monotone families of graphs. More generally, on all monotone sets of graphs, the $\ell_\infty$ error of our Lipschitz extensions is nearly optimal.
翻译:我们设计了首个节点差分隐私算法,用于近似图中连通分量的数量。给定一个表示$n$顶点图$G$的数据集和隐私参数$\varepsilon$,我们的算法在多项式时间内运行,并以$1-o(1)$的概率达到加性误差$\widetilde{O}(\frac{\Delta^*\ln\ln n}{\varepsilon})$,其中$\Delta^*$是$G$的生成森林中可能的最小最大度数。节点差分隐私算法目前仅适用于少数数据库分析任务。设计此类算法以估计连通分量数量的主要障碍在于:该图统计量对添加一个具有任意连接关系的节点(节点差分隐私旨在隐藏此类变化)并不鲁棒——每个图都与某个连通图互为邻居。我们通过设计一系列可高效计算的连通分量数量(等价于生成森林规模)的Lipschitz扩展来克服这一困难。这些扩展的构造(即我们算法的核心)基于$G$的森林多面体。我们证明了关于生成森林的若干组合事实,特别是:不含诱导$\Delta$星的图存在度数不超过$\Delta$的生成森林。基于此事实,我们证明了对于最大的单调图族,连通分量数量的Lipschitz扩展等于其真实函数值。更一般地,在所有单调图集上,我们的Lipschitz扩展的$\ell_\infty$误差接近最优。