Regression models with both high-dimensional responses and covariates have attracted growing attention. Standard multivariate regression models become inadequate when the response variables depend not only on observed covariates but also on latent variables that capture key unobserved characteristics. To draw statistical inferences on covariate effects while accounting for latent variables, we consider a high-dimensional generalized latent variable model that accommodates mixed-type responses and allows for flexible dependence between covariates and latent variables, which is more suitable for many real-world applications than existing methods that either rely on a linear regression form or restricted assumptions on the dependence between covariates and latent variables. We develop an alternating algorithm that iteratively updates the regression parameters and the latent variables, transforming an intractable nonconvex problem into a sequence of tractable convex subproblems. Theoretically, we provide algorithmic guarantees by establishing statistical consistency of the resulting estimator and deriving an error bound for it. Further, building on this estimator, we construct a debiased estimator for the covariate effect and establish its asymptotic normality. The effectiveness of the proposed method is demonstrated through an application to evaluating the fairness of the Programme for International Student Assessment (PISA).
翻译:同时具有高维响应和协变量的回归模型日益受到关注。当响应变量不仅依赖于观测到的协变量,还依赖于捕捉关键未观测特征的潜变量时,标准多元回归模型不再适用。为在考虑潜变量的同时对协变量效应进行统计推断,我们考虑一种高维广义潜变量模型,该模型可容纳混合类型响应,并允许协变量与潜变量之间存在灵活依赖关系,比现有依赖于线性回归形式或对协变量-潜变量依赖施加严格限制的方法更适用于许多实际应用。我们开发了一种交替算法,迭代更新回归参数和潜变量,将难以处理的非凸问题转化为一系列可处理的凸子问题。理论上,我们通过建立所得估计量的统计一致性并推导其误差界来提供算法保障。进一步地,基于该估计量,我们构建了协变量效应的去偏估计量,并证明了其渐近正态性。通过应用至国际学生评估项目(PISA)的公平性评估,验证了所提方法的有效性。