Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.
翻译:随着复杂且规模庞大的网络在多个科学领域的应用中日益普遍,针对此类网络虽已有多种模型与方法,但由于网络数据特有的结构,网络上的交叉验证仍具挑战性。本文提出一种通用的交叉验证方法,称为NETCROP(基于重叠划分的网络交叉验证)。其核心思想是将原始网络划分为多个具有共享重叠部分的子网络,从而生成由这些子网络构成的训练集,以及由子网络间节点对构成的测试集。这种训练-测试划分构成了网络交叉验证程序的基础,可广泛应用于网络的模型选择与参数调优问题。该方法对大型网络计算高效,因其在训练步骤中使用了更小的子网络。我们为使用NETCROP的若干模型选择与参数调优任务提供了方法细节与理论保证。数值结果表明,NETCROP在多种网络模型选择与参数调优问题上实现了精确的交叉验证。结果还表明,与现有的网络交叉验证方法相比,NETCROP在计算速度上显著更快,且通常具有更高的准确性。