Functional magnetic resonance imaging (fMRI) data contain high levels of noise and artifacts. To avoid contamination of downstream analyses, fMRI-based studies must identify and remove these noise sources prior to statistical analysis. One common approach is the "scrubbing" of fMRI volumes that are thought to contain high levels of noise. However, existing scrubbing techniques are based on ad hoc measures of signal change. We consider scrubbing via outlier detection, where volumes containing artifacts are considered multidimensional outliers. Robust multivariate outlier detection methods are proposed using robust distances (RDs), which are related to the Mahalanobis distance. These RDs have a known distribution when the data are i.i.d. normal, and that distribution can be used to determine a threshold for outliers where fMRI data violate these assumptions. Here, we develop a robust multivariate outlier detection method that is applicable to non-normal data. The objective is to obtain threshold values to flag outlying volumes based on their RDs. We propose two threshold candidates that embark on the same two steps, but the choice of which depends on a researcher's purpose. Our main steps are dimension reduction and selection, robust univariate outlier imputation to get rid of the effect of outliers on the distribution, and estimating an outlier threshold based on the upper quantile of the RD distribution without outliers. The first threshold candidate is an upper quantile of the empirical distribution of RDs obtained from the imputed data. The second threshold candidate calculates the upper quantile of the RD distribution that a nonparametric bootstrap uses to account for uncertainty in the empirical quantile. We compare our proposed fMRI scrubbing method to motion scrubbing, data-driven scrubbing, and restrictive parametric multivariate outlier detection methods.
翻译:功能磁共振成像(fMRI)数据包含大量噪声和伪影。为避免污染下游分析,基于fMRI的研究必须在统计分析前识别并移除这些噪声源。常见方法是对被认为含有高噪声水平的fMRI体素进行"刮除"。然而,现有刮除技术基于信号变化的临时测量。我们考虑通过异常值检测进行刮除,将包含伪影的体素视为多维异常值。本文提出基于稳健距离(RDs,与马氏距离相关)的稳健多变量异常值检测方法。当数据满足独立同分布正态假设时,这些RDs具有已知分布,可利用该分布确定异常值阈值,而fMRI数据往往违反这些假设。我们开发了一种适用于非正态数据的稳健多变量异常值检测方法,目标是根据RDs获取标记异常体素的阈值。我们提出两个基于相同两步过程的候选阈值,选择取决于研究者的目的。主要步骤包括:降维与选择、通过稳健单变量异常值插补消除异常值对分布的影响、基于无异常值RDs分布的上分位数估计异常值阈值。第一个候选阈值是插补数据所得RDs经验分布的上分位数;第二个候选阈值通过非参数自助法计算RDs分布的上分位数,以考虑经验分位数的不确定性。我们将提出的fMRI刮除方法与运动刮除、数据驱动刮除以及限制性参数多变量异常值检测方法进行了比较。