Several approaches have been proposed in the literature for clustering multivariate ordinal data. These methods typically treat missing values as absent information, rather than recognizing them as valuable for profiling population characteristics. To address this gap, we introduce a Bayesian nonparametric model for co-clustering multivariate ordinal data that treats censored observations as informative, rather than merely missing. We demonstrate that this offers a significant improvement in understanding the underlying structure of the data. Our model exploits the flexibility of two independent Dirichlet processes, allowing us to infer potentially distinct subpopulations that characterize the latent structure of both subjects and variables. The ordinal nature of the data is addressed by introducing latent variables, while a matrix factorization specification is adopted to handle the high dimensionality of the data in a parsimonious way. The conjugate structure of the model enables an explicit derivation of the full conditional distributions of all the random variables in the model, which facilitates seamless posterior inference using a Gibbs sampling algorithm. We demonstrate the method's performance through simulations and by analyzing politician and movie ratings data.
翻译:文献中已提出了多种用于聚类多元序数数据的方法。这些方法通常将缺失值视为不存在的信息,而非认识到它们对于刻画总体特征具有价值。为弥补这一不足,我们提出了一种用于共聚类多元序数数据的贝叶斯非参数模型,该模型将删失观测视为信息性数据,而非仅仅是缺失值。我们证明,这为理解数据的底层结构提供了显著改进。我们的模型利用了**两个独立狄利克雷过程**的灵活性,使我们能够推断可能存在的不同子总体,这些子总体刻画了主体和变量的潜在结构。通过引入潜变量来处理数据的序数性质,同时采用矩阵分解设定,以简约的方式处理数据的高维性。模型的共轭结构使得所有随机变量的完全条件分布得以显式推导,从而便于使用**吉布斯采样算法**进行无缝的后验推断。我们通过模拟实验以及分析政治人物和电影评分数据,展示了该方法的性能。