Dynamic Classification of Latent Disease Progression with Auxiliary Surrogate Labels

Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, limiting the ability to predict future states. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvement with finite samples is demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.

翻译：基于患者不断演变的健康信息进行疾病进展预测具有挑战性，尤其是在真实疾病状态因诊断能力或高昂成本而未知的情况下。例如，缺乏金标准的神经学诊断阻碍了区分阿尔茨海默病（AD）与相关病症，如包括路易体痴呆（LBD）在内的AD相关痴呆（ADRDs）。结合时间依赖的替代标签和健康标志物可能改善疾病预测。然而，现有文献使用纯粹的生成式方法对信息丰富的替代标签和反映潜在状态的观测变量进行建模，这限制了预测未来状态的能力。我们提出将传统的隐马尔可夫模型作为生成模型与一个时变的判别式分类模型相结合，以同时处理可能误指定的替代标签并纳入疾病进展的重要标志物。我们开发了一种带有主观标签的自适应前向-后向算法用于估计，并利用修正的后验算法和维特比算法，仅基于客观标志物来预测未来状态或新患者的进展。重要的是，这种改进消除了对纵向标志物边际分布进行建模的需求，而这在传统算法中是必需的。我们建立了渐近性质，并通过模拟研究证明了在有限样本下的显著改进。对美国国家阿尔茨海默病协调中心（NACC）神经病理学数据集的分析显示，在区分LBD与AD方面，准确性得到了极大提升。