The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores. In many modern applications, auxiliary covariates are collected alongside responses and encode information about the same latent structure. Traditional approaches to incorporating such covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification. We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry, rather than modeling their joint distribution. We propose a likelihood-free spectral estimation procedure that combines heterogeneous data sources through a balance parameter controlling their relative contribution. To accommodate high-dimensional and heteroskedastic noise, we employ heteroskedastic principal component analysis before performing simplex-based geometric recovery. Our theoretical analysis establishes weaker identifiability conditions than those required in the covariate-free model, and further derives finite-sample, entrywise error bounds for both mixed membership scores and item parameters. These results demonstrate that auxiliary covariates can provably improve latent structure recovery, yielding faster convergence rates in high-dimensional regimes. Simulation studies and an application to educational assessment data illustrate the computational efficiency, statistical accuracy, and interpretability gains of the proposed method. The code for reproducing these results is open-source and available at \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}
翻译:隶属度模型是一种灵活的潜变量模型,通过个体层面的混合隶属度得分分析多元分类数据。在许多现代应用中,辅助协变量与响应变量一同收集,并编码了相同潜在结构的信息。传统整合此类协变量的方法通常依赖于完全指定的联合似然函数,这些方法计算量大且对模型设定错误敏感。我们提出一种协变量辅助的隶属度模型,通过利用响应变量与协变量共享的低秩单纯形几何结构(而非对其联合分布进行建模)来整合两类信息。我们提出一种免似然函数的谱估计方法,通过控制数据源相对贡献的平衡参数来融合异构数据源。为处理高维异方差噪声,我们在执行基于单纯形的几何复原前采用异方差主成分分析。理论分析建立了比无协变量模型更弱的可识别性条件,并进一步推导了混合隶属度得分与项目参数的有限样本逐项误差界。这些结果表明辅助协变量可被严格证明能改善潜在结构复原效果,在高维场景下获得更快的收敛速率。模拟研究及教育评估数据的应用验证了所提方法的计算效率、统计精度与可解释性优势。复现结果的代码已开源,详见 \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}