A Bayesian Model for Co-clustering Ordinal Data with Informative Missing Entries

Several approaches have been proposed in the literature for clustering multivariate ordinal data. These methods typically treat missing values as absent information, rather than recognizing them as valuable for profiling population characteristics. To address this gap, we introduce a Bayesian nonparametric model for co-clustering multivariate ordinal data that treats censored observations as informative, rather than merely missing. We demonstrate that this offers a significant improvement in understanding the underlying structure of the data. Our model exploits the flexibility of two independent Dirichlet processes, allowing us to infer potentially distinct subpopulations that characterize the latent structure of both subjects and variables. The ordinal nature of the data is addressed by introducing latent variables, while a matrix factorization specification is adopted to handle the high dimensionality of the data in a parsimonious way. The conjugate structure of the model enables an explicit derivation of the full conditional distributions of all the random variables in the model, which facilitates seamless posterior inference using a Gibbs sampling algorithm. We demonstrate the method's performance through simulations and by analyzing politician and movie ratings data.

翻译：文献中已提出了多种用于聚类多元序数数据的方法。这些方法通常将缺失值视为不存在的信息，而非认识到它们对于刻画总体特征具有价值。为弥补这一不足，我们提出了一种用于共聚类多元序数数据的贝叶斯非参数模型，该模型将删失观测视为信息性数据，而非仅仅是缺失值。我们证明，这为理解数据的底层结构提供了显著改进。我们的模型利用了**两个独立狄利克雷过程**的灵活性，使我们能够推断可能存在的不同子总体，这些子总体刻画了主体和变量的潜在结构。通过引入潜变量来处理数据的序数性质，同时采用矩阵分解设定，以简约的方式处理数据的高维性。模型的共轭结构使得所有随机变量的完全条件分布得以显式推导，从而便于使用**吉布斯采样算法**进行无缝的后验推断。我们通过模拟实验以及分析政治人物和电影评分数据，展示了该方法的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日