Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.
翻译:自闭症谱系障碍(ASD)是一种以社交互动障碍、沟通困难和重复行为为特征的神经系统疾病。尽管其主要病因源于遗传因素,但早期诊断至关重要,而利用机器学习技术为实现更快、更具成本效益的诊断提供了有效途径。本研究采用多种机器学习方法识别关键ASD特征,旨在优化并自动化诊断过程。我们研究了八种先进分类模型在ASD检测中的有效性,通过准确率、精确率、召回率、特异度、F1分数、曲线下面积(AUC)、卡帕系数和对数损失等指标评估模型性能,以确定适用于二分类数据集的最佳分类器。在各类分类模型中,针对儿童数据集,支持向量机(SVM)和逻辑回归(LR)模型达到了100%的最高准确率;针对成人数据集,LR模型取得了97.14%的最高准确率。我们提出的人工神经网络(ANN)模型在精确调整超参数后,对新组合数据集实现了94.24%的最高准确率。由于几乎所有使用真实标签的分类模型均获得高准确率,我们进而深入探究五种主流聚类算法,以理解模型在无真实标签场景下的行为特征。通过计算归一化互信息(NMI)、调整兰德指数(ARI)和轮廓系数(SC)等指标选择最佳聚类模型。评估结果表明,谱聚类在NMI和ARI指标上优于所有基准聚类模型,且其SC指标与k-means算法获得的最优值相当。相关实现代码已托管于GitHub。