Change-point models deal with ordered data sequences. Their primary goal is to infer the locations where an aspect of the data sequence changes. In this paper, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles via Gibbs sampler. Our model incorporates a Dirichlet Process on the constant-wise change-point structures to cluster observations while simultaneously performing change-point estimation. Additionally, our approach controls the number of clusters in the model, not requiring the specification of the number of clusters a priori. Our method's performance is evaluated on simulated data under various scenarios and on a real dataset from single-cell genomic sequencing.
翻译:变点模型处理有序数据序列,其主要目标是推断数据序列中某些特征发生变动的位置。本文提出并实现了一种基于吉布斯采样的非参数贝叶斯模型,通过观测数据的常值变点轮廓对观测对象进行聚类。该模型在常值变点结构上引入狄利克雷过程,在聚类观测数据的同时完成变点估计。此外,该方法可自动控制模型中的聚类数量,无需预先指定聚类数目。通过多种场景下的模拟数据以及单细胞基因组测序真实数据集,验证了所提方法的性能。