关于谱算法在半随机随机块模型中鲁棒性的研究 (On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models)

In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to provably recover the cluster structure for graphs generated from certain probabilistic models, such as the Stochastic Block Model (SBM). However, spectral clustering is known to be non-robust to model mis-specification. Techniques based on semidefinite programming have been shown to be more robust, but they incur significant computational overheads. In this work, we study the robustness of spectral algorithms against semirandom adversaries. Informally, a semirandom adversary is allowed to ``helpfully'' change the specification of the model in a way that is consistent with the ground-truth solution. Our semirandom adversaries in particular are allowed to add edges inside clusters or increase the probability that an edge appears inside a cluster. Semirandom adversaries are a useful tool to determine the extent to which an algorithm has overfit to statistical assumptions on the input. On the positive side, we identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent, i.e., it exactly recovers the planted partitioning. On the negative side, we show that in these classes spectral bisection with the _normalized_ Laplacian outputs a partitioning that makes a classification mistake on a constant fraction of the vertices. Finally, we demonstrate numerical experiments that complement our theoretical findings.

翻译：在图二分问题中，我们被给定一个具有两个规模相等的未标记社区的图$G$，目标是恢复这些社区中的顶点。一种流行的启发式方法，称为谱聚类，是基于$G$的拉普拉斯矩阵第二小特征值对应的特征向量输出估计的社区分配。可以证明，谱算法能够从某些概率模型（如随机块模型（SBM））生成的图中可证明地恢复聚类结构。然而，已知谱聚类对模型误设不具有鲁棒性。基于半定规划的技术已被证明更具鲁棒性，但它们会产生显著的计算开销。在这项工作中，我们研究了谱算法对抗半随机对手的鲁棒性。非正式地说，半随机对手被允许以与真实解一致的方式“有益地”改变模型的规范。我们研究的半随机对手特别被允许在聚类内部添加边或增加边出现在聚类内部的概率。半随机对手是确定算法在多大程度上过度拟合输入统计假设的有用工具。在积极方面，我们识别了若干类半随机对手，在这些对手下，使用_非归一化_拉普拉斯矩阵的谱二分是强一致的，即它精确地恢复了植入的划分。在消极方面，我们表明在这些类别中，使用_归一化_拉普拉斯矩阵的谱二分会输出一个在常数比例的顶点上产生分类错误的划分。最后，我们展示了补充我们理论发现的数值实验。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/