Neuroimaging data allows researchers to model the relationship between multivariate patterns of brain activity and outcomes related to mental states and behaviors. However, the existence of outlying participants can potentially undermine the generalizability of these models and jeopardize the validity of downstream statistical analysis. To date, the ability to detect and account for participants unduly influencing various model selection approaches have been sorely lacking. Motivated by a task-based functional magnetic resonance imaging (fMRI) study of thermal pain, we propose and establish the asymptotic distribution for a diagnostic measure applicable to a number of different model selectors. A high-dimensional clustering procedure is further combined with this measure to detect multiple influential observations. In a series of simulations, our proposed method demonstrates clear advantages over existing methods in terms of improved detection performance, leading to enhanced predictive and variable selection outcomes. Application of our method to data from the thermal pain study illustrates the influence of outlying participants, in particular with regards to differences in activation between low and intense pain conditions. This allows for the selection of an interpretable model with high prediction power after removal of the detected observations. Though inspired by the fMRI-based thermal pain study, our methods are broadly applicable to other high-dimensional data types.
翻译:神经影像数据使研究者能够建立大脑活动的多元模式与心理状态及行为相关结果之间的关系模型。然而,离群参与者的存在可能削弱这些模型的泛化能力,并危及后续统计分析的有效性。迄今为止,检测并解释对各类模型选择方法产生不当影响的参与者的能力仍严重不足。基于一项热痛任务态功能磁共振成像(fMRI)研究的启发,我们提出并确立了一种适用于多种不同模型选择器的诊断度量的渐近分布。进一步将该度量与高维聚类方法相结合,以检测多个影响性观测。在一系列模拟实验中,我们提出的方法在提升检测性能方面展现出相较于现有方法的明显优势,从而改善了预测与变量选择的结果。将我们的方法应用于热痛研究数据,揭示了离群参与者的影响,特别是在低强度与高强度疼痛条件下激活差异方面的影响。这使得在剔除检测到的观测值后,能够选择一个具有高预测能力的可解释模型。尽管受基于fMRI的热痛研究启发,我们的方法广泛适用于其他高维数据类型。