Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in the existing literature. In this paper, we aim to fill this gap in the ``diverging dimension" regime and propose a center-augmented subgroup identification method under the Factor Augmented (sparse) Linear Model framework, which bridge dimension reduction and sparse regression together. The proposed method is flexible to the possibly high cross-sectional dependence among covariates and inherits the computational advantage with complexity $O(nK)$, in contrast to the $O(n^2)$ complexity of the conventional pairwise fusion penalty method in the literature, where $n$ is the sample size and $K$ is the number of subgroups. We also investigate the asymptotic properties of its oracle estimators under conditions on the minimal distance between group centroids. To implement the proposed approach, we introduce a Difference of Convex functions based Alternating Direction Method of Multipliers (DC-ADMM) algorithm and demonstrate its convergence to a local minimizer in finite steps. We illustrate the superiority of the proposed method through extensive numerical experiments and a real macroeconomic data example. An \texttt{R} package \texttt{SILFS} implementing the method is also available on CRAN.
翻译:子群分析因其能够从异质总体中识别有意义的子群从而提升预测能力而受到日益广泛的关注。然而,在社会科学和生物学等诸多场景中,协变量可能因存在公共因子而高度相关,这给群组识别带来了巨大挑战,而现有文献却忽视了这一问题。本文旨在填补“发散维数”情形下的这一空白,并在因子增强(稀疏)线性模型框架下提出一种中心增强的子群识别方法,该方法将降维与稀疏回归相结合。所提方法对协变量间可能存在的强横截面依赖性具有灵活性,并继承了计算复杂度为 $O(nK)$ 的优势,而文献中传统的成对融合惩罚方法的复杂度为 $O(n^2)$,其中 $n$ 为样本量,$K$ 为子群数量。我们还研究了在组间质心最小距离条件下其oracle估计量的渐近性质。为实现所提方法,我们引入了一种基于凸差函数的交替方向乘子法(DC-ADMM)算法,并证明了其在有限步内收敛到局部极小值。我们通过大量数值实验和一个真实的宏观经济数据实例说明了所提方法的优越性。实现该方法的 \texttt{R} 包 \texttt{SILFS} 亦已在 CRAN 上发布。