In social science research, understanding latent structures in populations through survey data with categorical responses is a common and important task. Traditional methods like Factor Analysis and Latent Class Analysis have limitations, particularly in handling categorical data and accommodating mixed memberships in latent structures, respectively. Moreover, choosing the number of factors or latent classes is often subjective and can be challenging in the presence of missing values. This study introduces a Hierarchical Dirichlet Process Mixture of Products of Multinomial Distributions (HDPMPM) model, which leverages the flexibility of nonparametric Bayesian methods to address these limitations. The HDPMPM model allows for multiple latent classes within individuals and avoids fixing the number of mixture components at an arbitrary number. Additionally, it incorporates missing data imputation directly into the model's Gibbs sampling process. By applying a truncated stick-breaking representation of the Dirichlet process, we can derive a Gibbs sampling scheme for posterior inference. An application of the HDPMPM model to the 2016 American National Election Study (ANES) data demonstrates its effectiveness in identifying political profiles and handling missing data scenarios, including those that are missing at random (MAR) and missing completely at random (MCAR). The results show that the HDPMPM model successfully recovers dominant profiles and manages complex latent structures in survey data, providing an alternative tool for social science researchers in dealing with categorical data with missing values.
翻译:在社会科学研究中,通过分类响应调查数据理解人群的潜在结构是一项常见且重要的任务。传统方法如因子分析和潜在类别分析分别存在局限性,特别是在处理分类数据与适应潜在结构中的混合隶属关系方面。此外,因子或潜在类别数量的选择常具主观性,且在存在缺失值时尤为困难。本研究提出分层狄利克雷过程混合多项分布乘积模型,该模型利用非参数贝叶斯方法的灵活性以应对这些局限。HDPMPM模型允许个体内存在多个潜在类别,并避免将混合成分数量固定为任意值。此外,模型将缺失数据插补直接纳入其吉布斯采样过程。通过采用狄利克雷过程的截断断棍表示,我们推导出用于后验推断的吉布斯采样方案。将HDPMPM模型应用于2016年美国国家选举研究数据的实例表明,该模型能有效识别政治特征谱并处理包括随机缺失与完全随机缺失在内的缺失数据场景。结果显示,HDPMPM模型成功还原了主导特征谱,并能处理调查数据中的复杂潜在结构,为社会科学研究者处理含缺失值的分类数据提供了新的工具选择。