We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.
翻译:针对强可聚类性的图,我们研究了设计次线性时间谱聚类预言机的问题。这类图包含$k$个潜在簇,每个簇具有较大的内部电导率(至少$\varphi$)和较小的外部电导率(至多$\varepsilon$)。我们的目标是预处理图以支持聚类归属查询,关键要求是预处理和查询回答均需在次线性时间内完成,且生成的划分应与接近真实聚类的$k$划分一致。现有预言机要么依赖内部与外部电导率之间$\textrm{poly}(k)\log n$的差距,要么需要指数级(关于$k/\varepsilon$)的预处理时间。我们的算法虽以略高的误分类率代价,但放宽了这些假设。我们还证明了该聚类预言机对少量随机边删除具有鲁棒性。为验证理论界,我们在合成网络上进行了实验。