Benchmarking the hundreds of functional connectivity (FC) modeling methods on large-scale fMRI datasets is critical for reproducible neuroscience. However, the combinatorial explosion of model-data pairings makes exhaustive evaluation computationally prohibitive, preventing such assessments from becoming a routine pre-analysis step. To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample's unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, while SCLCS identifies stable samples via a top-k ranking, we further introduce a density-balanced sampling strategy as a necessary correction to promote diversity, ensuring the final core-set is both structurally robust and distributionally representative. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) core-set selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC operator benchmarking, thereby making large-scale operators comparisons a feasible and integral part of computational neuroscience. Code is publicly available on https://github.com/lzhan94swu/SCLCS
翻译:在大规模fMRI数据集上对数百种功能连接(FC)建模方法进行基准测试,对于可重复的神经科学研究至关重要。然而,模型-数据配对的组合爆炸使得穷举评估在计算上不可行,阻碍了此类评估成为常规的预分析步骤。为突破这一瓶颈,我们通过选择一个小型、具有代表性的核心集来重新构建FC基准测试的挑战,该核心集的唯一目的是保持FC算子相对性能排名的完整性。我们将此形式化为一个保持排序的子集选择问题,并提出了用于核心集选择的结构感知对比学习(SCLCS),这是一个用于选择这些核心集的自监督框架。SCLCS首先使用自适应Transformer学习每个样本独特的FC结构。然后,它引入了一种新颖的结构扰动分数(SPS)来量化这些学习到的结构在训练过程中的稳定性,从而识别出代表基础连接原型的样本。最后,虽然SCLCS通过top-k排序识别稳定样本,但我们进一步引入了一种密度平衡采样策略作为必要的校正,以促进多样性,确保最终的核心集在结构上稳健且在分布上具有代表性。在大规模REST-meta-MDD数据集上,SCLCS仅使用10%的数据即可保持真实模型排名,在排序一致性(nDCG@k)上优于最先进(SOTA)的核心集选择方法高达23.2%。据我们所知,这是首个将核心集选择形式化用于FC算子基准测试的工作,从而使大规模算子比较成为计算神经科学中可行且不可或缺的一部分。代码公开于https://github.com/lzhan94swu/SCLCS