We consider the problem of testing for differences in group-specific slopes between the selected groups in panel data identified via k-means clustering. In this setting, the classical Wald-type test statistic is problematic because it produces an extremely inflated type I error probability. The underlying reason is that the same dataset is used to identify the group structure and construct the test statistic, simultaneously. This creates dependence between the selection and inference stages. To address this issue, we propose a valid selective inference approach conditional on the selection event to account for the selection effect. We formally define the selective type I error and describe how to efficiently compute the correct p-values for clusters obtained using k-means clustering. Furthermore, the same idea can be extended to test for differences in coefficients due to a single covariate and can be incorporated into the GMM estimation framework. Simulation studies show that our method has satisfactory finite sample performance. We apply this method to explore the heterogeneous relationships between economic growth and the $CO_2$ emission across countries for which some new findings are discovered. An R package TestHomoPanel is provided to implement the proposed selective inference framework for panel data.
翻译:本文研究在通过k-means聚类识别的面板数据中,检验选定组别间组别特定斜率差异的问题。在此背景下,经典的Wald型检验统计量存在问题,因为它会产生极高的第一类错误概率。根本原因在于同一数据集同时用于识别组别结构和构建检验统计量,这导致了选择阶段与推断阶段之间的依赖性。为解决这一问题,我们提出了一种有效的条件选择性推断方法,该方法以选择事件为条件来考虑选择效应。我们正式定义了选择性第一类错误,并描述了如何高效计算使用k-means聚类所得聚类的正确p值。此外,同一思路可扩展至检验由单一协变量引起的系数差异,并可整合到广义矩估计(GMM)框架中。模拟研究表明,我们的方法在有限样本下具有令人满意的性能。我们应用该方法探究了各国经济增长与$CO_2$排放之间的异质性关系,并发现了一些新的结论。本文提供了R包TestHomoPanel,用于实现所提出的面板数据选择性推断框架。