Dementia is a complex syndrome impacting cognitive and emotional functions, with Alzheimer's disease being the most common form. This study focuses on enhancing dementia prediction using machine learning (ML) techniques on patient health data. Supervised learning algorithms are applied in this study, including K-Nearest Neighbors (KNN), Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), and Gaussian Process Classifiers. To address class imbalance and improve model performance, techniques such as Synthetic Minority Over-sampling Technique (SMOTE) and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization were employed. Among the models, LDA achieved the highest testing accuracy of 98%. This study highlights the importance of model interpretability and the correlation of dementia with features such as the presence of the APOE-epsilon4 allele and chronic conditions like diabetes. This research advocates for future ML innovations, particularly in integrating explainable AI approaches, to further improve predictive capabilities in dementia care.
翻译:痴呆症是一种影响认知与情感功能的复杂综合征,其中阿尔茨海默病是最常见的形式。本研究聚焦于利用患者健康数据,通过机器学习技术提升痴呆症的预测能力。研究采用监督学习算法,包括K近邻算法、二次判别分析、线性判别分析以及高斯过程分类器。为应对类别不平衡问题并提升模型性能,研究采用了合成少数类过采样技术和词频-逆文档频率向量化等方法。在众多模型中,线性判别分析取得了最高的测试准确率,达到98%。本研究强调了模型可解释性的重要性,并揭示了痴呆症与APOE-ε4等位基因的存在以及糖尿病等慢性疾病特征之间的相关性。该研究倡导未来在机器学习领域的创新,特别是在整合可解释人工智能方法方面,以进一步提升痴呆症护理的预测能力。