Change-point models deal with ordered data sequences. Their primary goal is to infer the locations where an aspect of the data sequence changes. In this paper, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles via Gibbs sampler. Our model incorporates a Dirichlet Process on the constant-wise change-point structures to cluster observations while simultaneously performing multiple change-point estimation. Additionally, our approach controls the number of clusters in the model, not requiring the specification of the number of clusters \textit{a priori}. Satisfactory clustering and estimation results were obtained when evaluating our method under various simulated scenarios and on a real dataset from single-cell genomic sequencing. Our proposed methodology is implemented as an R package called BayesCPclust and is available at \texttt{https://github.com/acarolcruz/BayesCPclust}.
翻译:摘要:变点模型处理有序数据序列,其主要目标是推断数据序列中某一方面发生变的位置。本文提出并实现了一种基于吉布斯采样的非参数贝叶斯模型,该方法通过常值变点特征对观测数据进行聚类。该模型在常值变点结构上引入狄利克雷过程,在实现多重变点估计的同时完成观测数据聚类。此外,本方法能够控制模型中的聚类数量,无需先验指定聚类数目。在多种模拟场景下及单细胞基因组测序真实数据集上的评估表明,该方法获得了令人满意的聚类与估计结果。我们提出的方法已实现为R语言软件包BayesCPclust,代码托管于\verb|https://github.com/acarolcruz/BayesCPclust|。