This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance.
翻译:本研究基于阿尔茨海默病神经影像学倡议(ADNI)数据集,旨在探索阿尔茨海默病(AD)的早期检测与疾病进展。我们采用创新的数据预处理策略,包括利用随机森林算法填充缺失数据,并处理异常值与无效数据,从而充分挖掘和利用有限的数据资源。通过斯皮尔曼相关系数分析,我们识别出与AD诊断强相关的若干特征。基于这些特征,我们构建并测试了三种机器学习模型:随机森林、XGBoost和支持向量机(SVM)。其中,XGBoost模型在诊断性能上表现最优,准确率达到91%。总体而言,本研究成功克服了数据缺失的挑战,并为阿尔茨海默病的早期检测提供了宝贵见解,展现了其独特的研究价值与实践意义。