There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies. Unsupervised learning can provide insight into the pathophysiology of AD. A mixture of regressions can simultaneously identify clusters from multiple biomarkers while accounting for within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.
翻译:除尸检外,目前尚无阿尔茨海默病(AD)诊断的金标准。无监督学习可为AD病理生理学提供见解。回归混合模型能在考虑组内人口学效应的同时,从多个生物标志物中同步识别聚类。阿尔茨海默病脑脊液(CSF)生物标志物存在检测限,这带来了额外挑战。我们应用具有多变量截断高斯分布的回归混合模型(亦称删失多变量高斯回归混合模型或多变量Tobit回归混合模型),分析了埃默里大学戈伊苏埃塔阿尔茨海默病研究中心及埃默里健康大脑研究中逾3000名参与者的脑脊液数据,重点研究已知检测限下的β-淀粉样蛋白1-42(Abeta42)、总tau蛋白及磷酸化tau蛋白。我们解决了文献中关于截断多变量高斯分布回归混合模型的三项空白:软件可用性、统计推断及聚类准确性。研究发现三个聚类,分别倾向于对应AD组、正常对照组及非AD病理组。脑脊液谱因种族、性别及遗传标志物ApoE4而存在差异,凸显了在存在检测限的无监督学习中考虑人口学因素的重要性。值得注意的是,AD样组中的非裔美国人参与者tau蛋白负荷显著较低。