Although network data have become increasingly popular and widely studied, the vast majority of statistical literature has focused on unipartite networks, leaving relatively few theoretical results for bipartite networks. In this paper, we study the model selection problem for bipartite stochastic block models. We propose a penalized cross-validation approach that incorporates appropriate penalty terms for different candidate models, addressing the new and challenging issue that underfitting may occur on one side while overfitting occurs on the other. To the best of our knowledge, our method provides the first consistency guarantee for model selection in bipartite networks. Through simulations under various scenarios and analysis of two real datasets, we demonstrate that our approach not only outperforms traditional modularity-based and projection-based methods, but also naturally preserves potential asymmetry between the two node sets.
翻译:尽管网络数据日益普及并得到广泛研究,但绝大多数统计学文献聚焦于单分网络,针对二分网络的理论成果相对较少。本文研究二分随机区块模型的模型选择问题。我们提出一种惩罚交叉验证方法,为不同候选模型引入适当的惩罚项,以解决一个新颖且具有挑战性的问题:网络一侧可能出现欠拟合而另一侧出现过拟合的现象。据我们所知,该方法首次为二分网络的模型选择提供了一致性保证。通过多种场景下的模拟实验及两个真实数据集的分析,我们证明该方法不仅优于传统的基于模块度和基于投影的方法,而且能自然地保持两个节点集之间潜在的非对称性。