With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. However, existing contrastive and masked autoencoder methods typically rely on naive random dropping or masking for augmentation, which is ill-suited for brain graphs and hypergraphs as it disrupts semantically meaningful connectivity patterns. Moreover, commonly used graph-level readout and reconstruction schemes fail to capture global structural information, limiting the robustness of learned representations. In this work, we propose a unified diffusion-based pretraining framework that addresses both limitations. First, diffusion is designed to guide structure-aware dropping and masking strategies, preserving brain graph semantics while maintaining effective pretraining diversity. Second, diffusion enables topology-aware graph-level readout and node-level global reconstruction by allowing graph embeddings and masked nodes to aggregate information from globally related regions. Extensive experiments across multiple neuroimaging datasets with over 25,000 subjects and 60,000 scans involving various mental disorders and brain atlases demonstrate consistent performance improvements.
翻译:随着脑信号基础模型研究日益深入,基于图的预训练已成为从连接组数据中学习可迁移表征的重要范式。然而,现有的对比学习和掩码自编码器方法通常依赖简单的随机丢弃或掩码进行数据增强,这种策略会破坏具有语义意义的连接模式,并不适用于脑图及超图结构。此外,常用的图级读出与重建方案难以捕获全局结构信息,限制了所学表征的鲁棒性。本研究提出一个统一的基于扩散的预训练框架以解决上述双重局限。首先,扩散过程被设计用于引导结构感知的丢弃与掩码策略,在保持脑图语义的同时维持有效的预训练多样性。其次,扩散机制通过使图嵌入与掩码节点能够聚合来自全局相关区域的信息,实现了拓扑感知的图级读出与节点级全局重建。我们在涵盖超过25,000名被试和60,000次扫描的多个神经影像数据集上进行了广泛实验,涉及多种精神障碍与脑图谱,结果均显示出持续的性能提升。