Multistate Markov models are a canonical parametric approach for data modeling of observed or latent stochastic processes supported on a finite state space. Continuous-time Markov processes describe data that are observed irregularly over time, as is often the case in longitudinal medical data, for example. Assuming that a continuous-time Markov process is time-homogeneous, a closed-form likelihood function can be derived from the Kolmogorov forward equations -- a system of differential equations with a well-known matrix-exponential solution. Unfortunately, however, the forward equations do not admit an analytical solution for continuous-time, time-inhomogeneous Markov processes, and so researchers and practitioners often make the simplifying assumption that the process is piecewise time-homogeneous. In this paper, we provide intuitions and illustrations of the potential biases for parameter estimation that may ensue in the more realistic scenario that the piecewise-homogeneous assumption is violated, and we advocate for a solution for likelihood computation in a truly time-inhomogeneous fashion. Particular focus is afforded to the context of multistate Markov models that allow for state label misclassifications, which applies more broadly to hidden Markov models (HMMs), and Bayesian computations bypass the necessity for computationally demanding numerical gradient approximations for obtaining maximum likelihood estimates (MLEs). Supplemental materials are available online.
翻译:多状态马尔可夫模型是对有限状态空间上观测或潜在随机过程进行数据建模的经典参数化方法。连续时间马尔可夫过程描述了随时间不规则观测的数据,例如在纵向医学数据中常见的情况。假设连续时间马尔可夫过程是时间齐次的,可以从柯尔莫哥洛夫前向方程(一组具有著名矩阵指数解的微分方程)推导出闭合形式的似然函数。然而,遗憾的是,前向方程对于连续时间非齐次马尔可夫过程无法给出解析解,因此研究人员和实践者常常做出过程是分段时间齐次的简化假设。本文提供了关于在分段齐次假设被违反的更现实场景下可能产生的参数估计偏差的直觉和示例,并倡导以真正的时间非齐次方式进行似然计算。特别关注允许状态标签误分类的多状态马尔可夫模型(这更广泛地适用于隐马尔可夫模型,HMM),而贝叶斯计算则避免了为获得最大似然估计(MLE)而需要计算密集型数值梯度近似的需求。补充材料可在网上获取。