Hidden Markov models (HMMs) are flexible tools for clustering dependent data coming from unknown populations, allowing nonparametric modelling of the population densities. Identifiability fails when the data is in fact independent, and we study the frontier between learnable and unlearnable two-state nonparametric HMMs. Interesting new phenomena emerge when the cluster distributions are modelled via density functions (the 'emission' densities) belonging to standard smoothness classes compared to the multinomial setting. Notably, in contrast to the multinomial setting previously considered, the identification of a direction separating the two emission densities becomes a critical, and challenging, issue. Surprisingly, it is possible to "borrow strength" from estimators of the smoother density to improve estimation of the other. We conduct precise analysis of minimax rates, showing a transition depending on the relative smoothnesses of the emission densities.
翻译:隐马尔可夫模型(HMMs)是用于对来自未知总体的相依数据进行聚类的灵活工具,允许对总体密度进行非参数建模。当数据实际独立时,模型的可识别性失效,本文研究了可学习与不可学习的两状态非参数HMMs之间的边界。与多项分布情形相比,当聚类分布通过属于标准光滑类别的密度函数(即"发射"密度)进行建模时,出现了有趣的新现象。值得注意的是,与先前考虑的多项分布情形不同,识别区分两个发射密度的方向成为一个关键且具有挑战性的问题。令人惊讶的是,可以利用较光滑密度估计器的"强度借用"来改进另一个密度的估计。我们对极小化极大速率进行了精确分析,展示了依赖于发射密度相对光滑度的转变。