Group number selection is a key question for group panel data modelling. In this work, we develop a cross validation method to tackle this problem. Specifically, we split the panel data into a training dataset and a testing dataset on the time span. We first use the training dataset to estimate the parameters and group memberships. Then we apply the fitted model to the testing dataset and then the group number is estimated by minimizing certain loss function values on the testing dataset. We design the loss functions for panel data models either with or without fixed effects. The proposed method has two advantages. First, the method is totally data-driven thus no further tuning parameters are involved. Second, the method can be flexibly applied to a wide range of panel data models. Theoretically, we establish the estimation consistency by taking advantage of the optimization property of the estimation algorithm. Experiments on a variety of synthetic and empirical datasets are carried out to further illustrate the advantages of the proposed method.
翻译:组数选择是面板数据建模中的一个关键问题。本文提出了一种交叉验证方法来解决该问题。具体而言,我们在时间跨度上将面板数据分割为训练数据集和测试数据集。首先利用训练数据集估计参数和组别归属,然后将拟合模型应用于测试数据集,通过最小化测试数据集上的特定损失函数值来估计组数。我们为含固定效应和不含固定效应的面板数据模型分别设计了损失函数。该方法具有两个优势:其一,完全基于数据驱动,无需引入额外调优参数;其二,可灵活应用于多种面板数据模型。在理论上,我们通过利用估计算法的优化性质建立了估计的一致性。通过对合成数据集和实证数据集的多组实验,进一步验证了所提方法的优势。