Bipartite networks, which encode interactions between two distinct types of entities, arise widely in applications and exhibit inherent asymmetry across node sets. Despite a growing literature on bipartite community detection, estimating community numbers $(K_1, K_2)$, a critical issue for bipartite network analysis, remains theoretically underdeveloped without any model selection consistency established, to our knowledge. Indeed, the inherent asymmetry and the two-dimensional parameter space with possibly drastically different $K_1$ and $K_2$ pose unique challenges that differ from unipartite cases. In particular, the candidate models may simultaneously overfit one node set while underfitting the other. To address these challenges, we propose Bipartite Cross-Validation (BCV), a penalized cross-validation framework that jointly selects $(K_1,K_2)$ in a fully data-driven manner. We establish the first model selection consistency for bipartite networks, notably accommodating the regime where the numbers of communities scale with the network size, revealing the intricate interplay between sparsity and model complexity. Simulations and real-data applications demonstrate strong finite-sample performance of BCV.
翻译:二部网络编码了两种不同类型实体之间的交互,在应用中广泛存在,且节点集间呈现出固有的不对称性。尽管关于二部网络社区检测的文献日益增多,但据我们所知,估计社区数量(K1, K2)——这一二部网络分析中的关键问题——在理论上仍不成熟,尚未建立模型选择一致性。事实上,固有的不对称性以及可能包含差异显著的K1和K2的二维参数空间,带来了与单部网络截然不同的独特挑战。特别是,候选模型可能同时对其中一个节点集过拟合,而对另一个欠拟合。为应对这些挑战,我们提出二部交叉验证(Bipartite Cross-Validation, BCV),这是一个惩罚化的交叉验证框架,能够以完全数据驱动的方式联合选择(K1, K2)。我们首次建立了二部网络的模型选择一致性,该结果尤其适用于社区数量随网络规模变化的场景,揭示了稀疏性与模型复杂度之间复杂的相互作用。仿真和实际数据应用表明,BCV具有优异的有限样本性能。