A novel non-parametric estimator of the correlation between grouped measurements of a quantity is proposed in the presence of noise. This work is primarily motivated by functional brain network construction from fMRI data, where brain regions correspond to groups of spatial units, and correlation between region pairs defines the network. The challenge resides in the fact that both noise and intra-regional correlation lead to inconsistent inter-regional correlation estimation using classical approaches. While some existing methods handle either one of these issues, no non-parametric approaches tackle both simultaneously. To address this problem, we propose a trade-off between two procedures: correlating regional averages, which is not robust to intra-regional correlation; and averaging pairwise inter-regional correlations, which is not robust to noise. To that end, we project the data onto a space where Euclidean distance is used as a proxy for sample correlation. We then propose to leverage hierarchical clustering to gather together highly correlated variables within each region prior to inter-regional correlation estimation. We provide consistency results, and empirically show our approach surpasses several other popular methods in terms of quality. We also provide illustrations on real-world datasets that further demonstrate its effectiveness.
翻译:针对含噪声情况下分组测量量之间相关性的估计问题,本文提出了一种新型非参数估计方法。该研究主要源于功能磁共振成像(fMRI)数据驱动的脑功能网络构建,其中脑区对应空间单元组,而区域对之间的相关性定义网络结构。核心挑战在于:经典方法在噪声和区域内相关性共同作用下,会导致区域间相关性估计不一致。现有方法虽能分别处理上述问题之一,但尚无非参数方法可同时解决两者。为此,我们提出两种策略的折中方案:区域均值相关性(对区域内相关性不鲁棒)与跨区域成对相关性平均(对噪声不鲁棒)。具体而言,我们将数据投影至以欧氏距离替代样本相关性的空间,进而利用层次聚类在区域间相关性估计前,对每个区域内高度相关的变量进行聚合。本文给出了估计的一致性证明,并通过实证表明该方法在质量上优于多种主流方法。真实数据实验结果进一步验证了其有效性。