Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects. Code provided at: https://github.com/kamyahari/IC-Encoding-Models.git

翻译：编码模型为将连续刺激特征与神经活动关联提供了强大框架，然而传统的体素级方法受限于测量噪声、被试间变异性以及因空间相关体素编码重叠神经信号导致的冗余性。本文提出基于独立成分的编码框架，用于分离fMRI数据中刺激驱动信号与噪声驱动信号。我们利用数据集子集将自然故事聆听的连续fMRI数据分解为独立成分，并在独立数据上训练编码模型，通过语言输入的大语言模型表征预测独立成分时间序列。跨被试分析显示，部分独立成分展现出持续高预测性。这些成分在时空上具有跨被试一致性，包含已知在故事聆听中响应的认知网络（听觉与语言网络）。听觉成分时间序列与声学刺激特征强相关，凸显了所识别成分时间序列的可解释性。被ICA-AROMA识别为噪声或运动相关伪影的成分预测性能普遍较差，证实高预测性成分反映的是真实刺激诱发的神经信号而非混淆因素。总体而言，基于独立成分的编码模型能够在功能网络层面进行分析，兼顾个体间网络位置的变异性，并提供易于跨被试比较的可解释结果。代码见：https://github.com/kamyahari/IC-Encoding-Models.git