This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics. Anatomical and functional MRI images have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such as Alzheimer's, Parkinson's, and Autism. Recently, the study of the brain in the form of brain networks using machine learning and graph analytics has become increasingly popular, especially to predict the early onset of these conditions. A brain network, represented as a graph, retains rich structural and positional information that traditional examination methods are unable to capture. However, the lack of publicly accessible brain network data prevents researchers from data-driven explorations. One of the main difficulties lies in the complicated domain-specific preprocessing steps and the exhaustive computation required to convert the data from MRI images into brain networks. We bridge this gap by collecting a large amount of MRI images from public databases and a private source, working with domain experts to make sensible design choices, and preprocessing the MRI images to produce a collection of brain network datasets. The datasets originate from 6 different sources, cover 4 brain conditions, and consist of a total of 2,702 subjects. We test our graph datasets on 12 machine learning models to provide baselines and validate the data quality on a recent graph analysis model. To lower the barrier to entry and promote the research in this interdisciplinary field, we release our brain network data and complete preprocessing details including codes at https://doi.org/10.17608/k6.auckland.21397377 and https://github.com/brainnetuoa/data_driven_network_neuroscience.
翻译:本文呈现了一套全面且高质量的功能性人脑网络数据集,旨在促进神经科学、机器学习与图分析交叉领域的研究。解剖与功能磁共振成像已被用于理解人脑的功能连接性,并在识别如阿尔茨海默病、帕金森病和自闭症等潜在神经退行性疾病方面尤为重要。近年来,利用机器学习与图分析以人脑网络形式研究大脑的方法日益流行,尤其用于预测这些疾病的早期发病。表示为图的人脑网络保留了传统检查方法无法捕获的丰富结构与位置信息。然而,公开可用的人脑网络数据匮乏阻碍了研究者的数据驱动探索。主要困难之一在于将MRI图像数据转化为人脑网络所需的复杂领域特定预处理步骤与密集计算。为填补这一空白,我们收集了来自公共数据库与私有来源的大量MRI图像,与领域专家合作做出合理设计选择,并对MRI图像进行预处理以生成一系列人脑网络数据集。这些数据集源自6个不同来源,涵盖4种脑部疾病,共计2,702名受试者。我们在12个机器学习模型上测试了我们的图数据集以提供基线结果,并在一个最新的图分析模型上验证了数据质量。为降低跨学科领域的研究门槛并推动其发展,我们公开了人脑网络数据及完整的预处理细节(包括代码),网址为https://doi.org/10.17608/k6.auckland.21397377 与https://github.com/brainnetuoa/data_driven_network_neuroscience。