The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users (represented by adjacency matrix of a graph). There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp information theoretic limits and efficient algorithms have been obtained for SBMs as a function of $p$ and $q$, which represent the intra-community and inter-community connection probabilities. In this paper, we study the community detection problem while preserving the privacy of the individual connections (edges) between the vertices. Focusing on the notion of $(\epsilon, \delta)$-edge differential privacy (DP), we seek to understand the fundamental tradeoffs between $(p, q)$, DP budget $(\epsilon, \delta)$, and computational efficiency for exact recovery of the community labels. To this end, we present and analyze the associated information-theoretic tradeoffs for three broad classes of differentially private community recovery mechanisms: a) stability based mechanism; b) sampling based mechanisms; and c) graph perturbation mechanisms. Our main findings are that stability and sampling based mechanisms lead to a superior tradeoff between $(p,q)$ and the privacy budget $(\epsilon, \delta)$; however this comes at the expense of higher computational complexity. On the other hand, albeit low complexity, graph perturbation mechanisms require the privacy budget $\epsilon$ to scale as $\Omega(\log(n))$ for exact recovery. To the best of our knowledge, this is the first work to study the impact of privacy constraints on the fundamental limits for community detection.
翻译:图上的社区检测目标是在给定用户间连接关系(由图邻接矩阵表示)的情况下,恢复用户潜在的标签/属性(如政治倾向)。近年来,当图由随机块模型(SBM)生成时,社区检测基本极限的理解取得了显著进展。具体而言,针对SBM,作为社区内与社区间连接概率$p$和$q$的函数,人们已获得了尖锐的信息论极限和高效算法。本文研究在保护顶点间个体连接(边)隐私的前提下解决社区检测问题。以$(\epsilon, \delta)$-边差分隐私(DP)概念为核心,我们试图理解$(p, q)$、DP预算$(\epsilon, \delta)$与社区标签精确恢复计算效率之间的基本权衡。为此,我们针对三类差分隐私社区恢复机制进行了分析与信息论权衡研究:a) 基于稳定性的机制;b) 基于采样的机制;c) 图扰动机制。主要发现是:基于稳定性和采样的机制在$(p,q)$与隐私预算$(\epsilon, \delta)$之间实现了更优的权衡,但代价是较高的计算复杂度。另一方面,尽管图扰动机制复杂度低,但其在精确恢复时要求隐私预算$\epsilon$需达到$\Omega(\log(n))$量级。据我们所知,这是首个研究隐私约束对社区检测基本极限影响的工作。