Grade of Membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD) based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix, and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.
翻译:成员等级(GoM)模型是多变量分类数据中广泛应用的个体级混合模型。GoM允许每个个体在多个极端潜在特征中具有混合成员关系,因此比限制每个个体仅属于单一特征的潜在类别模型具有更丰富的建模能力。但这种灵活性也带来了更难的识别性和估计问题。本文针对多变量二元响应数据,提出一种基于奇异值分解(SVD)的谱方法进行GoM分析。我们的方法基于如下观测:在GoM模型下,数据矩阵的期望具有低秩分解结构。在可识别性方面,我们推导出期望可识别性的充分且近乎必要条件。在估计方面,我们仅提取观测数据矩阵的前几个主导奇异向量,并利用这些向量的单纯形几何结构来估计混合成员得分及其他参数。该谱方法相比贝叶斯或似然方法具有显著计算优势,可扩展至大规模高维数据。大量模拟研究证实了本方法的高效性与准确性。我们还将该方法应用于人格测试数据集进行实例分析。