While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms due to the unique correlation structure, a consequence of taking observations on several subjects over multiple time points. Additionally, panel data are often plagued by missing data and dropouts, presenting issues for estimation algorithms. This research presents a family of hidden Markov models that compensate for the unique correlation structures that arise in panel data. A modified expectation-maximization algorithm capable of handling missing not at random data and dropout is presented and used to perform model estimation.
翻译:尽管基于模型的聚类分析持续取得进展,但在处理面板数据等多类型数据时仍面临挑战。多元面板数据因存在独特的相关结构——即在多个时间点上对多个观测对象进行观测产生的数据特征,给聚类算法带来了困难。此外,面板数据常因缺失数据和样本流失问题而受影响,给估计算法造成障碍。本研究提出了一类能够适应面板数据独特相关结构的隐马尔可夫模型。本文还提出了一种改进的期望最大化算法,该算法可有效处理非随机缺失数据及样本流失问题,并用于模型参数估计。