An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder

Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.

翻译：自闭症谱系障碍（ASD）是一种以社交互动障碍、沟通困难和重复行为为特征的神经系统疾病。尽管其主要病因源于遗传因素，但早期诊断至关重要，而利用机器学习技术为实现更快、更具成本效益的诊断提供了有效途径。本研究采用多种机器学习方法识别关键ASD特征，旨在优化并自动化诊断过程。我们研究了八种先进分类模型在ASD检测中的有效性，通过准确率、精确率、召回率、特异度、F1分数、曲线下面积（AUC）、卡帕系数和对数损失等指标评估模型性能，以确定适用于二分类数据集的最佳分类器。在各类分类模型中，针对儿童数据集，支持向量机（SVM）和逻辑回归（LR）模型达到了100%的最高准确率；针对成人数据集，LR模型取得了97.14%的最高准确率。我们提出的人工神经网络（ANN）模型在精确调整超参数后，对新组合数据集实现了94.24%的最高准确率。由于几乎所有使用真实标签的分类模型均获得高准确率，我们进而深入探究五种主流聚类算法，以理解模型在无真实标签场景下的行为特征。通过计算归一化互信息（NMI）、调整兰德指数（ARI）和轮廓系数（SC）等指标选择最佳聚类模型。评估结果表明，谱聚类在NMI和ARI指标上优于所有基准聚类模型，且其SC指标与k-means算法获得的最优值相当。相关实现代码已托管于GitHub。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日