We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.
翻译:我们研究了为具有强可聚类性的图设计亚线性时间谱聚类预言机的问题。此类图包含$k$个隐簇,每个簇的特征是较大的内部电导率(至少为$\varphi$)和较小的外部电导率(至多为$\varepsilon$)。我们的目标是对图进行预处理,以便支持聚类成员查询,其关键要求是预处理和查询响应都应在亚线性时间内完成,且最终划分应与接近真实聚类的$k$-划分一致。先前的预言机依赖于内部和外部电导率之间的$\textrm{poly}(k)\log n$间隙,或是指数级(关于$k/\varepsilon$)的预处理时间。我们的算法放宽了这些假设,尽管代价是略微更高的误分类率。我们还证明了我们的聚类预言机对少量随机边删除具有鲁棒性。为验证理论界,我们在合成网络上进行了实验。