Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs (Gro-M$^3$s) for multivariate categorical data, which improve parsimony and interpretability. In Gro-M$^3$s, observed variables are partitioned into groups such that the latent membership is constant for variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we derive transparent identifiability conditions for both the unknown grouping structure and model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet Gro-M$^3$s to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.
翻译:混合隶属模型(MMMs)是处理复杂多元数据的一类流行隐结构模型。MMMs并非强制每个个体归属于单个聚类,而是引入个体特定的权重向量,用以刻画跨聚类的部分隶属关系。这种灵活性带来了参数唯一识别、估计与解释的挑战。本文针对多元分类数据,提出一类新的维度分组混合隶属模型(Gro-M$^3$s),旨在提升模型简洁性与可解释性。在Gro-M$^3$s中,观测变量被划分为若干组,使得组内变量的隐隶属关系保持恒定,而组间则可不同。当所有变量归为同一组时,可得到传统潜在类别模型;当每个变量自成一组时,则退化为传统MMMs。该新模型对应概率张量的一种新颖分解。理论上,我们在一般设定下推导出未知分组结构与模型参数的透明可识别条件。方法上,我们针对狄利克雷Gro-M$^3$s提出贝叶斯方法,用于推断变量分组结构并估计模型参数。模拟结果展示了良好的计算性能,并在经验上证实了可识别性结论。我们通过功能性残疾调查数据集和人格测试数据集的实证应用,展示了新方法的应用价值。