Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings.
翻译:得益于其依赖结构,非参数隐马尔可夫模型(HMMs)能够在无需指定分组分布的情况下处理模型聚类。本文旨在研究使用HMMs进行聚类的贝叶斯风险,并提出相应的聚类流程。我们首先给出一个将分类贝叶斯风险与聚类贝叶斯风险联系起来的结论,并以此识别决定聚类任务难度的关键量。此外,我们在独立同分布(i.i.d.)框架下给出了该结论的证明——这一证明可能具有独立的研究价值。随后,我们研究了插件分类器的过剩风险。上述所有结论在观测值按序聚类的在线设定中依然成立。仿真实验验证了我们的发现。