Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes, and existing non-spectral approaches focus primarily on undirected or dense settings. We propose a fully non-spectral, two-stage procedure for community detection in sparse directed SBMs with potentially growing numbers of communities. The method first estimates the directed probability matrix using a neighborhood-smoothing scheme tailored to the asymmetric setting, and then applies $K$-means clustering to the estimated rows, thereby avoiding the limitations of eigen- or singular value decompositions in sparse, asymmetric networks. Our main theoretical contribution is a uniform row-wise concentration bound for the smoothed estimator, obtained through new arguments that control asymmetric neighborhoods and separate in- and out-degree effects. These results imply the exact recovery of all community labels with probability tending to one, under mild sparsity and separation conditions that allow both $γ_n \to 0$ and $K_n \to \infty$. Simulation studies, including highly directed, sparse, and non-symmetric block structures, demonstrate that the proposed procedure performs reliably in regimes where directed spectral and score-based methods deteriorate. To the best of our knowledge, this provides the first exact recovery guarantee for this class of non-spectral, neighborhood-smoothing methods in the sparse, directed setting.
翻译:在无向随机块模型(SBMs)中,精确恢复问题已得到充分研究,但对于有向且稀疏的网络,尤其是在社区数量发散的情况下,相关理论仍远未成熟。针对有向SBMs的谱方法通常在非对称、低度区域缺乏稳定性,而现有的非谱方法主要集中于无向或稠密网络。本文提出一种完全非谱的两阶段方法,用于在社区数量可能增长的稀疏有向SBMs中进行社区检测。该方法首先通过一种专为非对称场景设计的邻域平滑方案来估计有向概率矩阵,然后对估计的行应用K均值聚类,从而避免了在稀疏非对称网络中特征值或奇异值分解的局限性。我们的主要理论贡献是为平滑估计量建立了一致的行向集中界,这是通过控制非对称邻域并分离入度和出度影响的新论证得到的。这些结果表明,在允许γ_n→0和K_n→∞的温和稀疏性与分离性条件下,所有社区标签能以概率趋于1被精确恢复。仿真研究(包括高度有向、稀疏且非对称的块结构)表明,所提方法在有向谱方法和基于评分的方法性能下降的区域中表现可靠。据我们所知,这为稀疏有向场景下此类非谱邻域平滑方法提供了首个精确恢复保证。