We introduce a general semiparametric clusterwise elliptical distribution to assess how latent cluster structure shapes continuous outcomes. Using a subjectwise representation, we first estimate cluster-specific mean vectors and a cluster-invariant scatter matrix by minimizing a weighted sum of squares criterion augmented with a separation penalty; we provide an initialization scheme and a computational algorithm with guaranteed convergence. This initial estimator consistently recovers the true clusters and seeds a second phase that alternates pseudo-maximum likelihood (or pseudo-maximum marginal likelihood) estimation with cluster reassignment, yielding asymptotic semiparametric efficiency and an optimal clustering that asymptotically maximizes the probability of correct membership. We also propose a semiparametric information criterion for selecting the number of clusters. Monte Carlo simulations and empirical applications demonstrate strong finite-sample performance and practical value.
翻译:我们引入一种通用的半参数簇状椭圆分布,以评估潜在簇结构如何塑造连续型结果变量。通过采用受试者层面的表示方法,我们首先最小化一个经分离惩罚项增强的加权平方和准则,从而估计出簇特异均值向量与簇不变散布矩阵;我们给出了初始化方案及具有收敛保证的计算算法。该初始估计量能一致地恢复真实簇结构,并为第二阶段提供初始值——该阶段交替进行伪极大似然(或伪极大边际似然)估计与簇重新分配,从而获得渐近半参数有效性以及渐近最大化正确归属概率的最优聚类。我们还提出了一种用于选择簇数量的半参数信息准则。蒙特卡洛模拟与实证应用展示了其强大的有限样本性能与实用价值。