Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_1$ and the number of columns $n_2$ of the associated adjacency matrix are of different order, existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \neq L \geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \neq L \geq 2$, and show that it recovers the result in Ndaoud et al. (2022). We also derive a minimax lower bound on the misclustering error when $K = L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.
翻译:二部图聚类是网络分析中的一项基本任务。在高维场景下,当关联邻接矩阵的行数 $n_1$ 与列数 $n_2$ 量级不同时,现有基于对称图方法推导出的算法可能无法提供最优保证。随着高维场景中二部图应用的日益增多,设计该场景下的最优算法具有根本重要性。Ndaoud 等人(2022)的近期工作改进了列(或行)可划分为 $L = 2$(或 $K = 2$)个社区的特殊情况下误聚类率的上界。遗憾的是,其算法无法推广至 $K \neq L \geq 2$ 的更一般场景。我们通过引入一种基于幂法的新算法克服了这一局限。我们推导了 $K \neq L \geq 2$ 一般场景下精确恢复的条件,并证明该算法能复现 Ndaoud 等人(2022)的结果。此外,我们还在模型的对称版本中推导了当 $K = L$ 时误聚类误差的极小极大下界,该下界与对应的上界仅相差一个依赖于 $K$ 的因子。