Grade of Membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD) based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix, and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.
翻译:隶属度(Grade of Membership, GoM)模型是多元分类数据的流行个体级混合模型。GoM允许每个个体在多个极端潜在轮廓中具有混合隶属关系,因此相比将每个个体限制为属于单一轮廓的潜在类别模型,GoM模型具有更丰富的建模能力。GoM的灵活性以更棘手的可识别性和估计问题为代价。本文针对多元二元响应数据,提出一种基于奇异值分解(SVD)的谱方法用于GoM分析。该方法基于以下观察:在GoM模型下,数据矩阵的期望具有低秩分解结构。在可识别性方面,我们给出了期望可识别性的充分几乎必要条件;在估计方面,仅提取观测数据矩阵的前几个主导奇异向量,并利用这些向量的单纯形几何性质估计混合隶属度评分及其他参数。我们还证明了在个体数与项目数均趋于无穷的双重渐近机制下估计量的一致性。相比贝叶斯或似然方法,所提谱方法具有显著计算优势,可扩展至大规模高维数据。大量模拟研究验证了该方法的高效性与准确性,并通过人格测试数据集进行了实证分析。