Modern multi-layer networks are commonly stored and analyzed in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on the model-based statistical methods for community detection based on these data is still limited. This paper proposes a new method for consensus community detection and estimation in a multi-layer stochastic block model using locally stored and computed network data with privacy protection. A novel algorithm named privacy-preserving Distributed Spectral Clustering (ppDSC) is developed. To preserve the edges' privacy, we adopt the randomized response (RR) mechanism to perturb the network edges, which satisfies the strong notion of differential privacy. The ppDSC algorithm is performed on the squared RR-perturbed adjacency matrices to prevent possible cancellation of communities among different layers. To remove the bias incurred by RR and the squared network matrices, we develop a two-step bias-adjustment procedure. Then we perform eigen-decomposition on the debiased matrices, aggregation of the local eigenvectors using an orthogonal Procrustes transformation, and k-means clustering. We provide theoretical analysis on the statistical errors of ppDSC in terms of eigen-vector estimation. In addition, the blessings and curses of network heterogeneity are well-explained by our bounds.
翻译:现代多层网络通常由于隐私、所有权和通信成本而以局部和分布式的方式存储和分析。基于这些数据的模型驱动统计社区检测方法的文献仍然有限。本文提出了一种新方法,用于在隐私保护下利用本地存储和计算的网络数据进行多层随机块模型中的共识社区检测与估计。开发了一种名为隐私保护分布式谱聚类(ppDSC)的新算法。为保护边隐私,我们采用随机响应(RR)机制扰动网络边,该机制满足强差分隐私概念。ppDSC算法在平方RR扰动邻接矩阵上执行,以防止不同层之间社区的潜在抵消。为消除RR和平方网络矩阵引起的偏差,我们开发了两步偏差调整程序。随后对去偏矩阵进行特征分解,利用正交Procrustes变换聚合局部特征向量,并执行k-means聚类。我们从特征向量估计角度对ppDSC的统计误差进行了理论分析。此外,网络异质性的利弊也由我们的边界很好地解释。