Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs (Gro-M$^3$s) for multivariate categorical data, which improve parsimony and interpretability. In Gro-M$^3$s, observed variables are partitioned into groups such that the latent membership is constant for variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we derive transparent identifiability conditions for both the unknown grouping structure and model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet Gro-M$^3$s to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.

翻译：混合隶属模型（MMMs）是处理复杂多元数据的一类流行隐结构模型。MMMs并非强制每个个体归属于单个聚类，而是引入个体特定的权重向量，用以刻画跨聚类的部分隶属关系。这种灵活性带来了参数唯一识别、估计与解释的挑战。本文针对多元分类数据，提出一类新的维度分组混合隶属模型（Gro-M$^3$s），旨在提升模型简洁性与可解释性。在Gro-M$^3$s中，观测变量被划分为若干组，使得组内变量的隐隶属关系保持恒定，而组间则可不同。当所有变量归为同一组时，可得到传统潜在类别模型；当每个变量自成一组时，则退化为传统MMMs。该新模型对应概率张量的一种新颖分解。理论上，我们在一般设定下推导出未知分组结构与模型参数的透明可识别条件。方法上，我们针对狄利克雷Gro-M$^3$s提出贝叶斯方法，用于推断变量分组结构并估计模型参数。模拟结果展示了良好的计算性能，并在经验上证实了可识别性结论。我们通过功能性残疾调查数据集和人格测试数据集的实证应用，展示了新方法的应用价值。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日