Probing the computational underpinnings of subjective experience, or qualia, remains a central challenge in cognitive neuroscience. This project tackles this question by performing a rigorous comparison of the representational geometry of color qualia between state-of-the-art AI models and the human brain. Using a unique fMRI dataset with a "no-report" paradigm, we use Representational Similarity Analysis (RSA) to compare diverse vision models against neural activity under two conditions: pure perception ("no-report") and task-modulated perception ("report"). Our analysis yields three principal findings. First, nearly all models align better with neural representations of pure perception, suggesting that the cognitive processes involved in task execution are not captured by current feedforward architectures. Second, our analysis reveals a critical interaction between training paradigm and architecture, challenging the simple assumption that Contrastive Language-Image Pre-training(CLIP) training universally improves neural plausibility. In our direct comparison, this multi-modal training method enhanced brain-alignment for a vision transformer(ViT), yet had the opposite effect on a ConvNet. Our work contributes a new benchmark task for color qualia to the field, packaged in a Brain-Score compatible format. This benchmark reveals a fundamental divergence in the inductive biases of artificial and biological vision systems, offering clear guidance for developing more neurally plausible models.
翻译:探究主观体验(即感受质)的计算基础,仍然是认知神经科学的核心挑战。本项目通过对最先进的人工智能模型与人类大脑中颜色感受质的表征几何进行严格比较,来探讨这一问题。利用一个采用“无报告”范式的独特fMRI数据集,我们运用表征相似性分析(RSA)方法,在两种条件下(纯粹感知(“无报告”)和任务调制感知(“报告”))比较了多种视觉模型与神经活动。我们的分析得出三个主要发现。首先,几乎所有模型都与纯粹感知的神经表征更为一致,这表明当前的前馈架构未能捕捉到任务执行所涉及的认知过程。其次,我们的分析揭示了训练范式与架构之间的关键交互作用,挑战了“对比语言-图像预训练(CLIP)训练能普遍提高神经合理性”这一简单假设。在我们的直接比较中,这种多模态训练方法提升了视觉Transformer(ViT)与大脑的对齐度,但对ConvNet却产生了相反的效果。我们的工作为该领域贡献了一个关于颜色感受质的新基准任务,并以Brain-Score兼容的格式打包。该基准揭示了人工与生物视觉系统在归纳偏置上的根本差异,为开发更具神经合理性的模型提供了明确指导。