The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several distinct phenotypes, which cannot be directly observed in clinical practice. Identifying these phenotypes could enhance our understanding of POD pathogenesis and facilitate the development of targeted prevention and treatment strategies. In this paper, we propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes. We first demonstrate our approach using synthetic data, where we simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. We aim to mimic any clinical disease with our synthetic data generation method. By training a predictive model and applying SHAP, we show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space. We then present a case study using real-world data from a cohort of elderly surgical patients. The results showcase the utility of our approach in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more precise and personalized treatment strategies.
翻译:复杂疾病或综合征中表型的识别是精准医学的重要组成部分,其目标是根据患者个体特征调整医疗方案。术后谵妄(POD)是一种复杂的神经精神疾病,其临床表现和潜在病理生理机制存在显著异质性。我们假设POD包含若干无法在临床实践中直接观察到的不同表型,识别这些表型可增强对POD发病机制的理解,并促进靶向预防与治疗策略的开发。本文提出一种结合监督式机器学习(用于个性化POD风险预测)与无监督聚类技术的方法,以揭示潜在的POD表型。我们首先基于合成数据验证该方法,通过模拟具有预定义表型(基于不同特征组合)的患者队列,其中合成数据生成方法可模拟任意临床疾病。通过训练预测模型并应用SHAP方法,我们证明在SHAP特征重要性空间中对患者进行聚类能成功恢复真实潜在表型,其性能优于原始特征空间中的聚类。随后,我们以老年手术患者队列的真实临床数据开展案例研究。结果表明,该方法能有效揭示POD等复杂疾病的临床相关亚型,为开发更精准的个性化治疗策略奠定基础。