Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings.
翻译:由于其依赖结构,非参数隐马尔可夫模型能够在无需指定组分布的情况下处理基于模型的聚类任务。本文旨在研究使用隐马尔可夫模型进行聚类时的贝叶斯风险,并提出相应的聚类方法。我们首先给出一个将分类贝叶斯风险与聚类贝叶斯风险关联起来的结果,并利用该结果识别决定聚类任务难度的关键量。我们还在独立同分布框架下给出了该结果的证明,该证明本身或具有独立参考价值。随后,我们研究了插件分类器的超额风险。所有结果均被证明在在线场景(即观测数据被顺序聚类)中同样成立。仿真实验验证了我们的发现。