NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics

Machine learning provides a valuable tool for analyzing high-dimensional functional neuroimaging data, and is proving effective in predicting various neurological conditions, psychiatric disorders, and cognitive patterns. In functional magnetic resonance imaging (MRI) research, interactions between brain regions are commonly modeled using graph-based representations. The potency of graph machine learning methods has been established across myriad domains, marking a transformative step in data interpretation and predictive modeling. Yet, despite their promise, the transposition of these techniques to the neuroimaging domain has been challenging due to the expansive number of potential preprocessing pipelines and the large parameter search space for graph-based dataset construction. In this paper, we introduce NeuroGraph, a collection of graph-based neuroimaging datasets, and demonstrated its utility for predicting multiple categories of behavioral and cognitive traits. We delve deeply into the dataset generation search space by crafting 35 datasets that encompass static and dynamic brain connectivity, running in excess of 15 baseline methods for benchmarking. Additionally, we provide generic frameworks for learning on both static and dynamic graphs. Our extensive experiments lead to several key observations. Notably, using correlation vectors as node features, incorporating larger number of regions of interest, and employing sparser graphs lead to improved performance. To foster further advancements in graph-based data driven neuroimaging analysis, we offer a comprehensive open-source Python package that includes the benchmark datasets, baseline implementations, model training, and standard evaluation.

翻译：机器学习为分析高维功能神经影像数据提供了宝贵工具，并在预测各类神经系统疾病、精神障碍及认知模式方面展现出显著成效。在功能磁共振成像（MRI）研究中，脑区间的相互作用通常采用基于图的表示方法进行建模。图机器学习方法已在众多领域证实其强大能力，标志着数据解释与预测建模的变革性进展。然而，尽管前景广阔，这些技术在神经影像领域的移植仍面临挑战，这主要源于潜在预处理流程的庞杂性以及基于图的数据集构建所涉及的大规模参数搜索空间。本文提出NeuroGraph——一套基于图的神经影像数据集，并论证了其在预测多类别行为与认知特征方面的实用性。我们通过构建35个涵盖静态与动态脑连接的数据集，深入探索了数据集生成的搜索空间，并运行超过15种基线方法进行基准测试。此外，我们提供了适用于静态与动态图学习的通用框架。大量实验揭示了若干关键发现：使用相关性向量作为节点特征、纳入更多感兴趣区域以及采用更稀疏的图结构均能提升模型性能。为促进基于图的数据驱动神经影像分析研究发展，我们提供了完整的开源Python工具包，包含基准数据集、基线实现、模型训练及标准化评估模块。