Grade of Membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD) based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix, and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.
翻译:等级隶属度(GoM)模型是多变量分类数据中常用的个体级混合模型。GoM允许每个受试者在多个极端潜在特征中具有混合隶属度,因此与将每个受试者限制为仅属于单个特征的潜在类模型相比,GoM模型具有更丰富的建模能力。GoM的灵活性伴随着更困难的可识别性和估计问题。本文针对多变量二元响应数据,提出了一种基于奇异值分解(SVD)的谱方法。该方法基于一个关键观察:在GoM模型下,数据矩阵的期望具有低秩分解结构。在可识别性方面,我们发展了期望可识别性的充分且几乎必要的条件。在估计方面,我们仅提取观测数据矩阵的前几个主导奇异向量,并利用这些向量的单纯形几何性质来估计混合隶属度分数及其他参数。我们进一步在受试者数量和项目数量均趋于无穷的双渐近框架下,证明了该估计量的一致性。我们的谱方法相比贝叶斯或基于似然的方法具有显著的计算优势,且可扩展至大规模高维数据。大量模拟研究验证了该方法的高效性和准确性。我们还将其应用于人格测试数据集以进行实例说明。